Method and apparatus for intra-prediction using interpolation filters

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
The method optimizes intra-prediction in video coding by selecting a subpixel interpolation filter based on subpixel offset and mode, reducing memory and computational complexity in video coding standards like HEVC and VVC.

JP7874218B2Active Publication Date: 2026-06-15HUAWEI TECH CO LTD

View PDF 3 Cites 0 Cited by

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Patents
Current Assignee / Owner: HUAWEI TECH CO LTD
Filing Date: 2025-05-14
Publication Date: 2026-06-15

Application Information

Patent Timeline

14 May 2025

Application

15 Jun 2026

Publication

JP7874218B2

IPC: H04N19/117; H04N19/11; H04N19/157; H04N19/176; H04N19/59; H04N19/80

CPC: H04N19/107; H04N19/117; H04N19/182; H04N19/20; H04N19/42; H04N19/82; H04N19/105; H04N19/159

AI Tagging

Application Domain

Digital video signal modification

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing video coding standards, such as HEVC and VVC, face challenges in intra-prediction efficiency due to complex mode sets and non-adaptive index lists, leading to increased computational complexity and memory requirements.

Method used

A method and apparatus for intra-prediction that uses a mapping process to select a subpixel interpolation filter based on the subpixel offset and intra-prediction mode, determining the size of the primary reference side to optimize memory usage and simplify computations.

Benefits of technology

This approach reduces memory requirements and enhances computational efficiency in video coding by adaptively selecting the reference side and interpolation filter, facilitating more efficient image/video encoding and decoding.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure 0007874218000034
Figure 0007874218000035
Figure 0007874218000036

Patent Text Reader

Abstract

To provide a method, an apparatus, a computer program product and a non-transitory computer-readable medium for video coding.SOLUTION: A method comprises performing an intra-prediction process of a block comprising samples to be predicted. An interpolation filter is applied to reference samples of the block during the intra-prediction process of the block. The interpolation filter is selected on the basis of a subpixel offset between the reference samples and the samples to be predicted, and a size of a main reference side used in the intra-prediction process is determined according to a length of the interpolation filter and an intra-prediction mode that provides, out of a set of available intra-prediction modes, the greatest non-integer value of said subpixel offset. The main reference side comprises the reference samples.SELECTED DRAWING: Figure 2

Need to check novelty before this filing date? Find Prior Art

Description

[Technical Field]

[0001] Cross-reference of related applications This patent application claims priority rights to U.S. Provisional Patent Application No. 62 / 742,300 filed on October 6, 2018, U.S. Provisional Patent Application No. 62 / 744,096 filed on October 10, 2018, U.S. Provisional Patent Application No. 62 / 753,055 filed on October 30, 2018, and U.S. Provisional Patent Application No. 62 / 757,150 filed on November 7, 2018. This is a divisional application of Japanese Patent Application No. 2023-045836, which is a divisional application of Japanese Patent Application No. 2021-518698 (Japanese Patent No. 7250917). The aforementioned patent application is incorporated herein by reference in its entirety.

[0002] This disclosure relates to the technical field of coding and decoding images and / or videos, and more particularly to a method and apparatus for directional intra-prediction involving reference sample processing harmonized with the length of an interpolation filter. [Background technology]

[0003] Since the introduction of DVD discs, digital video has been widely used. Before transmission, video is encoded and transmitted using a transmission medium. Viewers receive the video and use viewing devices to decode and display it. Over the years, video quality has improved, for example, with higher resolution, color depth, and frame rates. This has resulted in larger data streams, which are now typically transported over the internet and mobile communication networks.

[0004] However, higher resolution video typically contains more information and therefore requires greater bandwidth. To reduce bandwidth requirements, video coding standards have been introduced that involve video compression. When video is encoded, bandwidth requirements (or, in the case of storage, corresponding memory requirements) are reduced. Often, this reduction comes at the expense of quality. Therefore, video coding standards attempt to find a balance between bandwidth requirements and quality.

[0005] High Efficiency Video Coding (HEVC) is an example of a video coding standard generally known to those skilled in the art. HEVC divides coding units (CUs) into prediction units (PUs) or transform units (TUs). Versatile Video Coding (VVC) is the most recent collaborative video project between the ITU-T Video Coding Experts Group (VCEG) and the ISO / IEC Moving Picture Experts Group (MPEG), working together in a partnership called the Joint Video Exploration Team (JVET). VVC is also known as the ITU-T H.266 / Next Generation Video Coding (NGVC) standard. In VVC, the concept of multiple partition types, namely the separation of CUs, PUs, and TUs, is removed as needed, except in cases where the CU is too large for the maximum transformation length, thus supporting greater flexibility in CU partition shapes.

[0006] The processing of these coding units (CUs), also called blocks, depends on their size, spatial location, and the coding mode specified by the encoder. Coding modes can be classified into two groups according to the type of prediction, namely intra-prediction mode and inter-prediction mode. Intra-prediction mode uses samples from the same picture (also called frame or image) to generate reference samples for calculating predicted values for samples in the block being reconstructed. Intra-prediction is also called spatial prediction. Inter-prediction mode is designed for temporal prediction and uses reference samples from the previous or next picture to predict samples in the block of the current picture.

[0007] The ITU-T VCEG (Q6 / 16) and ISO / IEC MPEG (JTC 1 / SC29 / WG11) are exploring the potential need for standardization of future video coding technologies that offer significantly greater compression capabilities than the current HEVC standard (including its current and upcoming extensions for screen content coding and high dynamic range coding). The groups are collaborating in this exploration activity within a joint research initiative called the Joint Video Exploration Team (JVET) to evaluate compression technology designs proposed by their experts in this area.

[0008] The VTM (Multipurpose Test Model) standard uses 35 intra-modes, while the BMS (Benchmark Set) uses 67 intra-modes.

[0009] The intra-mode coding scheme currently described in the BMS is considered complex, and the fact that the index list is always constant and not adaptive based on the current block characteristics (for example, for its adjacent block intra-modes) is a drawback of unselected mode sets. [Overview of the Initiative] [Means for solving the problem]

[0010] Embodiments of this application are disclosed that provide an apparatus and method for intra-prediction. The apparatus and method simplify the computational procedure for intra-prediction by using a mapping process to improve coding efficiency. The scope of protection is defined by the claims.

[0011] The above and other objectives are achieved by the subject matter of the independent claims. Further implementations are evident from the dependent claims, description, and figures.

[0012] Specific embodiments, along with other embodiments in the dependent claims, are outlined in the attached independent claims.

[0013] According to a first aspect, the present invention relates to a method for video coding. The method is performed by an encoding device or a decoding device. The method is This includes performing an intra-prediction process on a block, such as a block containing samples to be predicted or a block of predicted samples, specifically such as a lumen block containing lumen samples to be predicted, during which a subpixel interpolation filter is applied to a reference sample (e.g., a luminance reference sample) or during which a subpixel interpolation filter is applied to a reference sample (e.g., a chrominance reference sample), The subpixel interpolation filter is selected based on the subpixel offset, for example, the subpixel offset between the position of the reference sample and the position of the interpolated sample, or between the reference sample and the sample to be predicted. The size of the primary reference side used in the intra-prediction process is determined according to the length of the subpixel interpolation filter and the intra-prediction mode (e.g., one of the intra-prediction modes from the set of available intra-prediction modes) that yields the maximum value (e.g., the maximum non-integer value) of the subpixel offset, and the primary reference side comprises a reference sample.

[0014] A reference sample is a sample on which a prediction (in this case, an intra prediction) is performed. In other words, a reference sample is a sample outside the (current) block that is used to predict the sample of the (current) block. The term "current block" refers to the block on which the process, including the prediction, is performed. For example, a reference sample is a sample adjacent to the block in one or more of its sides. In other words, a reference sample used to predict the current block may be located in a line of samples that is at least partially adjacent to and parallel to one or more block boundaries (sides).

[0015] The reference sample may be a sample at an integer sample position, or an interpolated sample at a subsample position, such as a non-integer position. An integer sample position may refer to the actual sample position in the image to be coded (encoded or decoded).

[0016] The reference side is the side of the block from which reference samples are used to predict the samples of the block. The main reference side is the side of the block from which the reference samples are taken (in some embodiments, there is only one side from which the reference samples are taken). However, generally, the main reference side may refer to the side from which the reference samples are mainly taken (for example, most of the reference samples are taken from it, or the reference samples for predicting most of the block samples are taken from it). The main reference side includes the reference samples used to predict the samples of the block. If the main reference side consists of the reference samples used to predict the samples of the block and all of those reference samples used to predict the samples of the block are included in the main reference side, that can be advantageous for memory saving purposes. However, the present disclosure is also generally applicable with the main reference side including the reference samples used to predict the block. These may include the reference samples directly used for prediction and the reference samples used for filtering to obtain sub-samples that are then used for predicting the block samples.

[0017] Generally, the reference samples of the current block comprise the adjacent reconstructed samples of the current block. Thus, if the current block is the current chroma block, the chroma reference samples of the current chroma block comprise the adjacent reconstructed samples of the current chroma block. Thus, if the current block is the current luma block, the luma reference samples of the current luma block comprise the adjacent reconstructed samples of the current luma block.

[0018] It is understood that the memory requirements are determined by the maximum value of the sub-pixel offset. Therefore, by determining the size of the primary reference side according to the present disclosure, the present disclosure facilitates bringing about memory efficiency in video coding using intra prediction. In other words, by determining the size of the primary reference side used in the intra prediction process according to the first aspect described above, the memory requirements can be reduced while providing (storing) reference samples for predicting blocks. This can in turn lead to a more efficient implementation of intra prediction for image / video encoding and decoding.

[0019] In a possible implementation of the method according to such a first aspect, the interpolation filter is selected based on the sub-pixel offset between the position of the reference sample and the position of the predicted sample.

[0020] It is understood that the predicted samples are samples to be interpolated in that they are based on the output of the interpolation process.

[0021] In a possible implementation of the method according to such a first aspect, the sub-pixel offset is determined based on a reference line (such as refIdx), or the sub-pixel offset is determined based on intraPredAngle that depends on the selected intra prediction mode, or the sub-pixel offset is determined based on the distance between the side of the block of the reference sample and the side of the block of the predicted sample (such as a reference line), that is, from the side of the block of the reference sample (such as a reference line) to the side of the block of the predicted sample.

[0022] In a possible implementation of the method according to such a first aspect, the maximum value of the sub-pixel offset is the maximum non-integer sub-pixel offset (such as the maximum fractional sub-pixel offset or the maximum non-integer value of the sub-pixel offset), and the size of the primary reference side is the integer part of the maximum non-integer sub-pixel offset, and The size of the side of the block of the predicted sample, The length is selected to be equal to the sum of a portion or the entire length of the interpolation filter (such as half the length of the interpolation filter).

[0023] One of the advantages of such a choice in the size of the primary reference side is the preparation (storage / buffering) of all the samples necessary for intra-prediction of the block, and the reduction in the number of samples that are not used (storage / buffered) to predict the block (samples).

[0024] In one possible implementation of such a first embodiment of the method, If the intra-prediction mode is greater than the vertical intra-prediction mode (VER_IDX), the side of the predicted sample block is the width of the predicted sample block. or If the intra-prediction mode is smaller than the horizontal intra-prediction mode (HOR_IDX), the side of the predicted sample block is equal to the height of the predicted sample block.

[0025] For example, in Figure 10, VER_IDX corresponds to vertical intra-prediction mode #50, and HOR_IDX corresponds to horizontal intra-prediction mode #18.

[0026] In one possible implementation of the method according to such a first embodiment, a reference sample of the main reference side having a position larger than twice the size of the block side is set to be equal to a sample located at twice the size.

[0027] In other words, this is rightward padding by repeating pixels that extend beyond the doubled side length. The memory buffer size is preferably a power of two, and it is better to use the last sample of a buffer of a power of two size (i.e., located at twice the size) than to maintain a buffer of a size that is not a power of two.

[0028] In one possible implementation of the method according to such a first embodiment, the size of the main reference side is, The main department heads of the block, (The length of the interpolation filter, or a portion or the whole length of the interpolation filter, such as half the length of the interpolation filter) - 1, The following two values M, namely, Block Chief Department Head, It is determined as the sum of the integer part of the maximum (or greatest) non-integer subpixel offset + a portion or all of the length of the interpolation filter (such as half the length of the interpolation filter), or the largest of the integer part of the maximum (or greatest) non-integer subpixel offset + a portion or all of the length of the interpolation filter (such as half the length of the interpolation filter) + 1.

[0029] One of the advantages of such a choice in the size of the primary reference side is that it reduces or even avoids the preparation (storage / buffering) of all the samples required for intra-prediction of the block, and the preparation (storage / buffering) of samples that are not used to predict the block (or its samples).

[0030] Please note that “block main side,” “block side length,” “block main side length,” and “size of the block side of the predicted sample” are the same concept throughout this disclosure.

[0031] In one possible implementation of the method according to such a first aspect, when the maximum of the two values M is equal to the main side length of the block, right padding is not performed, or Right padding is performed when the maximum of the two values M is equal to the integer part of the maximum non-integer subpixel offset plus half the length of the interpolation filter, or the integer part of the maximum non-integer value of the subpixel offset plus half the length of the interpolation filter + 1.

[0032] In one possible implementation, padding is performed by repeating the first and / or last reference samples of the primary reference side on the left and / or right sides, respectively. Specifically, if the primary reference side is denoted as ref and the size of the primary reference side as refS, then the padding is expressed as ref[-1]=p[0] and / or ref[refS+1]=p[refS], where ref[-1] represents the left value of the primary reference side. p[0] represents the value of the first reference sample in the main reference side, ref[refS+1] represents the value to the right of the main reference side, p[refS] represents the value of the last reference sample in the main reference side.

[0033] In other words, right padding can be performed by ref[refS+1]=p[refS]. As an addition or alternative, left padding can be performed by ref[-1]=p[0].

[0034] In this way, padding, taking interpolation filtering into account, can facilitate the preparation of all the samples necessary for prediction.

[0035] In one possible implementation of such a first embodiment of the method, the filters used in the intra-prediction process are finite impulse response filters, and their coefficients are fetched from a lookup table.

[0036] In one possible implementation of such a first embodiment of the method, the interpolation filter used in the intra-prediction process is a 4-tap filter.

[0037] In one possible implementation of the method according to such a first embodiment, the coefficients of the interpolation filter are as follows:

[0038] [Table 1]

[0039] As shown above, the "Subpixel Offset" column is defined by a 1 / 32 subpixel resolution, depending on the subpixel offset, such as the non-integer part of the subpixel offset. In other words, interpolation filters (such as subpixel interpolation filters) are represented by the coefficients in the table above.

[0040] In one possible implementation of the method according to such a first embodiment, the coefficients of the interpolation filter are as follows:

[0041] [Table 2]

[0042] As shown above, the "Subpixel Offset" column is defined by a 1 / 32 subpixel resolution, depending on the subpixel offset, such as the non-integer part of the subpixel offset. In other words, interpolation filters (such as subpixel interpolation filters) are represented by the coefficients in the table above.

[0043] In one possible implementation of the method according to such a first embodiment, the coefficients of the interpolation filter are as follows:

[0044] [Table 3]

[0045] As shown above, the non-integer part of the subpixel offset, such as the offset itself, depends on the subpixel, and the "Subpixel Offset" column is defined by a 1 / 32 subpixel resolution. In other words, interpolation filters (such as subpixel interpolation filters) are represented by the coefficients in the table above.

[0046] In one possible implementation of the method according to such a first embodiment, the coefficients of the interpolation filter are as follows:

[0047] [Table 4]

[0048] As shown above, the "Subpixel Offset" column is defined by a 1 / 32 subpixel resolution, depending on the subpixel offset, such as the non-integer part of the subpixel offset. In other words, interpolation filters (such as subpixel interpolation filters) are represented by the coefficients in the table above.

[0049] In one possible implementation of the method according to such a first embodiment, the subpixel interpolation filter is selected from a set of filters used for the intra-prediction process for a given subpixel offset. In other words, the filter for the intra-prediction process for a given subpixel offset (for example, a single filter, or one of a set of filters, may be used for the intra-prediction process) is selected from a set of filters.

[0050] In one possible implementation of such a first embodiment of the method, the set of filters comprises a Gaussian filter and a cubic filter.

[0051] In one possible implementation of such a first embodiment of the method, the number of subpixel interpolation filters is N, where N subpixel interpolation filters are used for intra-reference sample interpolation, and N >= 1 and is a positive integer.

[0052] In one possible implementation of the method according to such a first embodiment, the reference sample used to obtain the value of the predicted sample in a block is not adjacent to the block of the predicted sample. The encoder may signal an offset value in the bitstream, so that this offset value indicates the distance between the adjacent line of the reference sample and the line of the reference sample from which the value of the predicted sample is derived. Figure 24 shows the possible positions of the reference sample line and the corresponding values of the ref_offset variable. The variable "ref_offset" indicates which reference line is used; for example, when ref_offset=0, it means that "reference line 0" (as shown in Figure 24) is used.

[0053] Examples of offset values used in a specific implementation of a video codec (for example, a video encoder or video decoder) are as follows: Using the adjacent line of the reference sample (ref_offset=0, indicated by "reference line 0" in Figure 24), Using the first line (closest to the adjacent line) (indicated by "reference line 1" in Figure 24, ref_offset=1), Use the third line (ref_offset=3, indicated by "reference line 3" in Figure 24).

[0054] The directional intra-prediction mode specifies the value of the subpixel offset (deltaPos) between two adjacent lines of a prediction sample. This value is represented by a 5-bit fixed-point integer. For example, deltaPos=32 means that the offset between two adjacent lines of a prediction sample is exactly 1 sample.

[0055] If the intra-prediction mode is greater than DIA_IDX (mode #34), the value of the primary reference side size is calculated for the example described above as follows: From the set of available intra-prediction modes (i.e., modes that the encoder may indicate for a block of prediction samples), the mode that is greater than DIA_IDX and provides the largest deltaPos value is considered. The desired subpixel offset value is derived as follows: The block height is summed with ref_offset and multiplied by the deltaPos value. If the result is divided by 32 and the remainder is 0, it is another maximum value of deltaPos as described above, however, when obtaining a mode from the set of available intra-prediction modes, previously considered prediction modes are skipped. Otherwise, the result of this multiplication is considered to be the largest non-integer subpixel offset. The integer part of this offset is taken by shifting it 5 bits to the right. The integer part of the largest non-integer subpixel offset is summed with the width of the block of prediction samples and half the length of the interpolation filter.

[0056] Instead, if the intra-prediction mode is smaller than DIA_IDX (mode #34), the value of the primary reference side size is calculated as follows for the example described above: Of the set of available intra-prediction modes (i.e., modes that the encoder may indicate for a block of prediction samples), the mode that is smaller than DIA_IDX and provides the largest deltaPos value is considered. The desired subpixel offset value is derived as follows: The block width is summed with ref_offset and multiplied by the deltaPos value. If the result is divided by 32 and the remainder is 0, it is another maximum value of deltaPos as described above, however, when obtaining a mode from the set of available intra-prediction modes, previously considered prediction modes are skipped. Otherwise, the result of this multiplication is considered to be the largest non-integer subpixel offset. The integer part of this offset is taken by shifting it 5 bits to the right. The integer part of the largest non-integer subpixel offset is summed with the height of the block of prediction samples and half the length of the interpolation filter.

[0057] According to a second aspect, the present invention relates to an intra-prediction method for predicting the current block contained in a picture. The method comprises determining the size of a major reference side used in the intra-prediction based on a plurality of available intra-prediction modes, which yields the largest non-integer value of the subpixel offset between a target sample among a plurality of target samples in the current block (such as the current sample among a plurality of current samples) and a reference sample used to predict the target sample in the current block (the reference sample is a reference sample among a plurality of reference samples contained in a major reference side), and the size of an interpolation filter to be applied to the reference sample contained in the major reference side. The method further comprises applying an interpolation filter to the reference sample contained in the major reference side to obtain a filtered reference sample, and predicting a plurality of samples (such as a plurality of current samples or a plurality of target samples) contained in the current block based on the filtered reference sample.

[0058] Therefore, this disclosure facilitates memory efficiency in video coding using intra-prediction.

[0059] For example, the size of the main reference side is determined as the sum of the integer part of the largest non-integer value of the subpixel offset, the size of the side of the current block, and half the size of the interpolation filter. In other words, the advantages of the second embodiment may correspond to the advantages of the first embodiment described above.

[0060] In some embodiments, if the intra-prediction mode is greater than the vertical intra-prediction mode VER_IDX, the side of the current block is the width of the current block, or if the intra-prediction mode is less than the horizontal intra-prediction mode HOR_IDX, the side of the current block is the height of the current block.

[0061] For example, the value of a reference sample that has a position in the main reference side that is larger than twice the size of the current block side is set to be equal to the value of a sample that has a sample position that is twice the size of the current block.

[0062] For example, the size of the main reference side is, • The current dimensions of the side of the block and - Half the length of the interpolation filter - 1, ·The following, in other words, - The size of the side of the block, and - The integer part of the maximum non-integer value of the subpixel offset + half the length of the interpolation filter (thus, the additional sample ref[refW+refIdx+x](x=1..(Max(1,nTbW / nTbH)*refIdx+1)) is derived as follows: i.e., ref[refW+refIdx+x]=p[-1+refW][-1-refIdx]), or the integer part of the maximum non-integer value of the subpixel offset + half the length of the interpolation filter + 1 (thus, the additional sample ref[refW+refIdx+x](x=1..(Max(1,nTbW / nTbH)*refIdx+2)) is derived as follows: i.e., ref[refW+refIdx+x]=p[-1+refW][-1-refIdx]) The largest of the two It is determined as the sum.

[0063] According to a third aspect, the present invention relates to an encoder comprising a processing circuit configuration for performing a method according to a first or second aspect of the present invention or any possible embodiment of the first or second aspect.

[0064] According to a fourth aspect, the present invention relates to a decoder comprising a processing circuit configuration for carrying out a method according to a first or second aspect of the present invention or any possible embodiment of the first or second aspect.

[0065] According to a fifth aspect, the present invention relates to an apparatus for intra-prediction of a current block contained in a picture, the apparatus comprising an intra-prediction unit configured to predict a target sample contained in the current block based on a filtered reference sample. The intra-prediction unit comprises a determination unit configured to determine the size of a major reference side used in the intra-prediction based on an intra-prediction mode among a plurality of available intra-prediction modes that yields the largest non-integer value of the subpixel offset between a target sample among a plurality of target samples in the current block and a reference sample used to predict the target sample in the current block (where the reference sample is a reference sample among a plurality of reference samples contained in a major reference side), and the size of an interpolation filter to be applied to the reference sample contained in the major reference side; and a filtering unit configured to apply an interpolation filter to the reference sample contained in the major reference side in order to obtain a filtered reference sample.

[0066] Therefore, this disclosure facilitates memory efficiency in video coding using intra-prediction.

[0067] In some embodiments, the determination unit determines the size of the main reference side as the sum of the integer part of the largest non-integer value of the subpixel offset, the size of the side of the current block, and half the size of the interpolation filter.

[0068] For example, if the intra prediction mode is greater than the vertical intra prediction mode VER_IDX, the side of the current block is the width of the current block, or if the intra prediction mode is less than the horizontal intra prediction mode HOR_IDX, the side of the current block is the height of the current block.

[0069] For example, the value of a reference sample that has a position in the main reference side that is larger than twice the size of the current block side is set to be equal to the value of a sample that has a sample position that is twice the size of the current block.

[0070] In some embodiments, the decision unit is • The size of the side of the block, The size of the main reference side is determined as the sum of the integer part of the maximum non-integer value of the subpixel offset + half the length of the interpolation filter, or the integer part of the maximum non-integer value of the subpixel offset + half the length of the interpolation filter + 1.

[0071] The decision unit may be configured to not perform right padding when the maximum of two values M is equal to the size of the side of the block, or to perform right padding when the maximum of two values M is equal to the integer part of the maximum subpixel offset plus half the length of the interpolation filter, or the integer part of the maximum non-integer value of the subpixel offset plus half the length of the interpolation filter plus 1.

[0072] As an addition or alternative, in some embodiments, the decision unit is configured to perform padding by repeating the first and / or last samples of the primary reference side to the left and / or right side, respectively, specifically as follows: i.e., if the primary reference side is denoted as ref and the size of the primary reference side as refS, then the padding is expressed as ref[-1]=p[0] and / or ref[refS+1]=p[refS], where ref[-1] represents the left value of the primary reference side and p[0] represents the value of the first reference sample of the primary reference side. ref[refS+1] represents the value to the right of the primary reference side, and p[refS] represents the value of the last reference sample of the primary reference side.

[0073] A method according to a second aspect of the present invention may be carried out by an apparatus according to a fifth aspect of the present invention. Further features and implementations of the apparatus according to the fifth aspect of the present invention correspond to features and implementations of the method according to a second aspect of the present invention or any possible embodiment of the second aspect.

[0074] According to the sixth aspect, an apparatus is provided comprising a module / unit / component / circuit for performing at least a portion of the steps of the above method by any prior embodiment or by any prior embodiment.

[0075] The apparatus according to this embodiment can be extended to an implementation form corresponding to an implementation form of any prior embodiment of the method. Therefore, the implementation form of the apparatus has the characteristics of the corresponding implementation form of any prior embodiment of the method.

[0076] The advantages of any preceding embodiment of an apparatus are the same as the advantages of the corresponding implementation of any preceding embodiment of a method.

[0077] According to a seventh aspect, the present invention relates to an apparatus for decoding a video stream, comprising a processor and memory. The memory stores instructions causing the processor to perform the method according to the first aspect or any possible embodiment of the first aspect.

[0078] According to an eighth aspect, the present invention relates to a video encoder for encoding multiple pictures into a bitstream, comprising a device for intra-prediction of the current block, as described in any of the embodiments above.

[0079] According to a ninth aspect, the present invention relates to a video decoder for decoding multiple pictures from a bitstream, comprising an apparatus for intra-prediction of the current block, as described in any of the embodiments above.

[0080] According to the tenth aspect, a computer-readable storage medium is proposed that stores instructions, when executed, causing one or more configured processors to code video data. The instructions cause one or more processors to execute a method according to the first aspect or any possible embodiment of the first aspect.

[0081] According to the eleventh aspect, the present invention relates to a computer program comprising program code for performing a method according to the first aspect or any possible embodiment of the first aspect when executed on a computer.

[0082] Another aspect of this application discloses a decoder comprising a processing circuit configuration configured to carry out the above method.

[0083] Another aspect of this application discloses a computer program product comprising program code for performing the above method.

[0084] In another aspect of this application, a decoder for decoding video data is disclosed, the decoder comprising one or more processors and a non-temporary computer-readable storage medium coupled to the processors and storing a program for execution by the processors, wherein the decoder is configured to perform the above method when the program is executed by the processors.

[0085] The processing circuit configuration can be implemented in hardware, or in a combination of hardware and software, for example, by a software-programmable processor.

[0086] The embodiments, models, and implementations described herein may produce the advantageous effects described above with reference to the first and second embodiments.

[0087] Details of one or more embodiments are described in the accompanying drawings and the following description. Other features, purposes, and advantages will become apparent from the description, drawings, and claims.

[0088] The following embodiments of the present invention will be described in more detail with reference to the attached figures and drawings. [Brief explanation of the drawing]

[0089] [Figure 1A] This is a block diagram showing an example of a video coding system configured to implement embodiments of the present invention. [Figure 1B] This block diagram shows another example of a video coding system configured to carry out embodiments of the present invention. [Figure 2] This is a block diagram showing an example of a video encoder configured to carry out embodiments of the present invention. [Figure 3] This is a block diagram showing an exemplary structure of a video decoder configured to carry out embodiments of the present invention. [Figure 4] This is a block diagram showing an example of an encoding or decoding device. [Figure 5] This is a block diagram showing another example of an encoding or decoding device. [Figure 6] This figure shows the direction and mode of angle intra-prediction, as well as the relevant values of pang for the vertical prediction direction. [Figure 7] This diagram shows the transformation of pref to p1 and ref for a 4x4 block. [Figure 8] This figure shows the configuration of p1 and ref for horizontal angle prediction. [Figure 9] This figure shows the configuration of p1 and ref for vertical angle prediction. [Figure 10A] This figure shows the direction and mode of angle intra-prediction, as well as the relevant values of the pang f A set of intra-prediction modes in JEM and BMS-1. [Figure 10B] This figure shows the direction and mode of angle intra-prediction, as well as the relevant values of the pang f A set of intra-prediction modes in VVC Draft 2. [Figure 11] This figure shows the intra-prediction mode in HEVC[1]. [Figure 12] This figure shows an example of interpolation filter selection. [Figure 13] This is a diagram illustrating the QTBT. [Figure 14] This diagram shows the orientation of the rectangular blocks. [Figure 15A] This figure shows an example of intra-prediction of blocks from a reference sample in the main reference side. [Figure 15B] This figure shows an example of intra-prediction of blocks from reference samples in the main reference side. [Figure 15C] This figure shows an example of intra-prediction of blocks from a reference sample in the main reference side. [Figure 16] This figure shows an example of intra-prediction of blocks from reference samples in the main reference side. [Figure 17] This figure shows an example of intra-prediction of blocks from reference samples in the main reference side. [Figure 18] This figure shows an example of intra-prediction of blocks from a reference sample in the main reference side. [Figure 19] This figure shows the interpolation filter used in intra-prediction. [Figure 20] This figure shows the interpolation filter used in intra-prediction. [Figure 21] This figure shows the interpolation filter used in intra-prediction. [Figure 22] This figure shows an interpolation filter used in intra-prediction, configured to implement an embodiment of the present invention. [Figure 23] This figure shows an interpolation filter used in intra-prediction, configured to implement an embodiment of the present invention. [Figure 24] This figure shows another example of the possible positions of the reference sample line and the corresponding values of the ref_offset variable. [Figure 25]This flowchart shows the intra-prediction method. [Figure 26] This is a block diagram of the intra-prediction device. [Figure 27] This is a block diagram illustrating an example structure of a content supply system that provides content distribution services. [Figure 28] This is a cabinet diagram showing an example of a terminal device. [Modes for carrying out the invention]

[0090] In the following, unless otherwise explicitly specified, the same reference numeral refers to the same or at least functionally equivalent feature.

[0091] The following description refers to accompanying drawings that form part of the present disclosure and illustrate specific aspects of the embodiments of the present invention, or specific aspects in which embodiments of the present invention may be used. It is understood that embodiments of the present invention may be used in other embodiments, comprising structural or logical modifications not shown in the drawings. Accordingly, the embodiments for carrying out the invention described below should not be taken in an restrictive sense, and the scope of the invention is defined by the appended claims.

[0092] For example, disclosures relating to a method described may also apply to a corresponding device or system configured to perform the method, and vice versa. For example, if one or more specific method steps are described, the corresponding device may include one or more units for performing the described method steps (e.g., one unit performing one or more steps, or multiple units, each performing one or more of the steps), such as functional units, even if such one or more units are not explicitly described or illustrated in the figures. On the other hand, if a particular device is described based on one or more units, such as functional units, the corresponding method may include one step for performing the functionality of one or more units (e.g., one step performing the functionality of one or more units, or multiple steps, each performing one or more of the functionality of the units), even if such one or more steps are not explicitly described or illustrated in the figures. Furthermore, it is understood that the various exemplary embodiments and / or features described herein may be combined with each other unless otherwise specifically stated.

[0093] Video coding typically refers to the processing of a sequence of pictures that make up a video or video sequence. Instead of the term "picture," the terms "frame" or "image" may be used as synonyms in the field of video coding. Video coding (or coding in general) comprises two parts: video encoding and video decoding. Video encoding is performed on the source side and typically involves processing the original video picture (e.g., by compression) to reduce the amount of data required to represent the video picture (for more efficient storage and / or transmission). Video decoding is performed on the destination side and typically involves the reverse processing compared to the encoder, in order to reconstruct the video picture. Embodiments referring to "coding" of a video picture (or picture in general) are understood to relate to the "encoding" or "decoding" of the video picture or each video sequence. The combination of the encoding and decoding parts is also called a codec (coding and decoding).

[0094] In the case of reversible video coding, the original video picture can be reconstructed, meaning the reconstructed video picture will have the same quality as the original video picture (assuming there is no transmission loss or other data loss during storage or transmission). In the case of irreversible video coding, for example, further compression is performed by quantization to reduce the amount of data representing the video picture, and the video picture may not be able to be completely reconstructed in the decoder, meaning the quality of the reconstructed video picture will be lower or worse than the quality of the original video picture.

[0095] Some video coding standards belong to the group of “irreversible hybrid video codecs” (i.e., combining spatial and temporal prediction in the sample domain with 2D transform coding to apply quantization in the transform domain). Each picture in a video sequence is typically divided into a pair of non-overlapping blocks, and coding is typically performed at the block level. In other words, in the encoder, video is typically processed, i.e., coded at the block (video block) level to generate predicted blocks, usually by using, for example, spatial (intra-picture) prediction and / or temporal (inter-picture) prediction; the predicted blocks are subtracted from the current block (the block currently being processed / to be processed) to obtain a residual block; the residual block is transformed and quantized in the transform domain to reduce (compress) the amount of data to be transmitted; and in the decoder, the reverse process compared to the encoder is applied to the coded or compressed block to reconstruct the current block for depiction. Furthermore, the encoder replicates a decoder processing loop such that both generate the same predictions (e.g., intra-prediction and inter-prediction) and / or reconstruction for processing, i.e., coding subsequent blocks.

[0096] In the following embodiments of the video coding system 10, the video encoder 20 and video decoder 30 are described with reference to Figures 1 to 3.

[0097] Figure 1A is a schematic block diagram showing an exemplary coding system 10, for example, a video coding system 10 (or short coding system 10) that may utilize the techniques of this application. The video encoder 20 (or short encoder 20) and video decoder 30 (or short decoder 30) of the video coding system 10 represent examples of devices that may be configured to perform the techniques described in the various examples in this application.

[0098] As shown in Figure 1A, the coding system 10 includes a source device 12 configured to provide coded picture data 21 to a destination device 14 for decoding coded picture data 13.

[0099] The source device 12 includes an encoder 20 and may additionally, i.e., optionally, include a picture source 16, a preprocessor (or preprocessing unit) 18, for example, a picture preprocessor 18, and a communication interface or communication unit 22.

[0100] The picture source 16 may comprise, or may comprise, any kind of picture capture device, e.g., a camera for capturing real-world pictures, and / or any kind of picture generation device, e.g., a computer graphics processor for generating computer-animated pictures, or any other kind of device for acquiring and / or providing real-world pictures, computer-generated pictures (e.g., screen content, virtual reality (VR) pictures), and / or any combination thereof (e.g., augmented reality (AR) pictures). The picture source may comprise any kind of memory or storage for storing any of the pictures described above.

[0101] To distinguish it from the processing performed by the preprocessor 18 and the preprocessing unit 18, the picture or picture data 17 is sometimes referred to as the raw picture or raw picture data 17.

[0102] The preprocessor 18 is configured to receive (raw) picture data 17, perform preprocessing on the picture data 17 to obtain a preprocessed picture 19 or preprocessed picture data 19. The preprocessing performed by the preprocessor 18 may include, for example, cropping, color format conversion (e.g., from RGB to YCbCr), color correction, or denoising. It can be understood that the preprocessing unit 18 may be an optional component.

[0103] The video encoder 20 is configured to receive pre-processed picture data 19 and provide encoded picture data 21 (further details are described below, for example, based on Figure 2).

[0104] The communication interface 22 of the source device 12 may be configured to receive encoded picture data 21 and transmit the encoded picture data 21 (or a more processed version thereof) via the communication channel 13 to another device, such as the destination device 14 or any other device, for storage or direct reconstruction.

[0105] The destination device 14 includes a decoder 30 (for example, a video decoder 30) and may additionally, i.e., optionally, include a communication interface or communication unit 28, a post-processor 32 (or post-processing unit 32), and a display device 34.

[0106] The communication interface 28 of the destination device 14 is configured to receive encoded picture data 21 (or a more processed version thereof) from, for example, the source device 12 directly or from any other source, for example, a storage device, for example, an encoded picture data storage device, and to provide the encoded picture data 21 to the decoder 30.

[0107] Communication interfaces 22 and 28 may be configured to transmit or receive encoded picture data 21 or encoded data 13 via a direct communication link between the source device 12 and the destination device 14, for example, via a direct wired or wireless connection, or via any type of network, for example, a wired network or a wireless network or any combination thereof, or any type of private network and public network or any type of combination thereof.

[0108] The communication interface 22 may be configured, for example, to package the encoded picture data 21 into an appropriate format, such as a packet, and / or to process the encoded picture data using any kind of transmit encoding or transmit processing for transmission over a communication link or communication network.

[0109] The communication interface 28, which forms the counterpart to the communication interface 22, may be configured, for example, to receive the transmitted data and process the transmitted data using any kind of corresponding transmit decoding or transmit processing and / or depackaging to obtain the encoded picture data 21.

[0110] Both communication interfaces 22 and 28 may be configured as unidirectional or bidirectional communication interfaces, as indicated by arrows to the communication channel 13 in Figure 1A pointing from source device 12 to destination device 14, for example, to set up a connection and send and receive messages to recognize, respond to and exchange any other information relating to the communication link and / or data transmission, such as encoded picture data transmission.

[0111] The decoder 30 is configured to receive encoded picture data 21 and provide decoded picture data 31 or decoded picture 31 (further details are described below, for example, based on Figure 3 or Figure 5).

[0112] The post-processor 32 of the destination device 14 is configured to post-process the decoded picture data 31 (also called reconstructed picture data), for example, the decoded picture 31, to obtain post-processed picture data 33, for example, the post-processed picture 33. The post-processing performed by the post-processing unit 32 may include, for example, color format conversion (e.g., from YCbCr to RGB), color correction, cropping, or resampling, or any other processing to prepare the decoded picture data 31 for display, for example, by the display device 34.

[0113] The display device 34 of the destination device 14 is configured to receive post-processed picture data 33 for displaying the picture to a user or viewer, for example. The display device 34 may be any type of display for representing the reconstructed picture, for example, an integrated or external display or monitor, or may include such a display. The display may include, for example, a liquid crystal display (LCD), an organic light-emitting diode (OLED) display, a plasma display, a projector, a microLED display, a liquid crystal on silicon (LCoS), a digital optical processor (DLP), or any other type of display.

[0114] Figure 1A shows the source device 12 and the destination device 14 as separate devices, but the device embodiment may also have the functionality of both the source device 12 or its corresponding functionality and the destination device 14 or its corresponding functionality, or both. In such embodiments, the source device 12 or its corresponding functionality and the destination device 14 or its corresponding functionality may be implemented using the same hardware and / or software, or by separate hardware and / or software, or any combination thereof.

[0115] As will become apparent to those skilled in the art based on the description, the functionality of the different units, i.e., the existence and (strict) division of functionality within the source device 12 and / or destination device 14 as shown in Figure 1A, may vary depending on the actual device and application.

[0116] The encoder 20 (e.g., video encoder 20) and the decoder 30 (e.g., video decoder 30) may each be implemented as one or more microprocessors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), discrete logic, hardware, or any combination thereof, as shown in Figure 1B. Where the technique is implemented in part in software, the device may store instructions for the software in a suitable non-temporary computer-readable storage medium to perform the technique of the Disclosure, and may execute the instructions in hardware using one or more processors. Any of the above (including hardware, software, and combinations of hardware and software) may be considered as one or more processors. Each of the video encoder 20 and the video decoder 30 may be contained within one or more encoders or decoders, and any of them may be integrated into the respective device as part of a combined encoder / decoder (codec).

[0117] The source device 12 and destination device 14 may comprise any of a wide range of devices, including any type of handheld or stationary device, such as a notebook or laptop computer, mobile phone, smartphone, tablet or tablet computer, camera, desktop computer, set-top box, television, display device, digital media player, video game console, video streaming device (such as a content service server or content distribution server), broadcast receiver device, broadcast transmitter device, etc., and may or may not have an operating system. In some cases, the source device 12 and destination device 14 may be equipped for wireless communication. Therefore, the source device 12 and destination device 14 may be wireless communication devices.

[0118] In some cases, the video coding system 10 shown in Figure 1A is merely an example, and the techniques of this application may be applied to video coding configurations (e.g., video coding or video decoding) that do not necessarily involve any data communication between the coding device and the decoding device. In other examples, the data may be retrieved from local memory, streamed over a network, etc. The video coding device may code and store the data in memory, and / or the video decoding device may retrieve and decode the data from memory. In some examples, coding and decoding are performed by devices that do not communicate with each other but simply code the data in memory and / or retrieve and decode the data from memory.

[0119] Figure 1B is an exemplary diagram of another exemplary video coding system 40, according to an exemplary embodiment, including the encoder 20 of Figure 2 and / or the decoder 30 of Figure 3. The system 40 can implement the various exemplary techniques described herein. In the illustrated implementation, the video coding system 40 may include an imaging device 41, a video encoder 100, a video decoder 30 (and / or a video coder implemented via a logic circuit configuration 47 of a processing unit 46), an antenna 42, one or more processors 43, one or more memory stores 44, and / or a display device 45.

[0120] As illustrated, the imaging device 41, antenna 42, processing unit 46, logic circuit configuration 47, video encoder 20, video decoder 30, processor 43, memory store 44, and / or display device 45 may be able to communicate with each other. Although illustrated with both the video encoder 20 and video decoder 30 as described, the video coding system 40 may include only the video encoder 20 or only the video decoder 30 in various examples.

[0121] As shown in the figure, in some examples the video coding system 40 may include an antenna 42. The antenna 42 may be configured, for example, to transmit or receive an encoded bitstream of video data. Furthermore, in some examples the video coding system 40 may include a display device 45. The display device 45 may be configured to display video data. As shown in the figure, in some examples the logic circuit configuration 47 may be implemented via a processing unit 46. The processing unit 46 may include application-specific integrated circuit (ASIC) logic, a graphics processor, a general-purpose processor, etc. The video coding system 40 may also include an optional processor 43, which may similarly include application-specific integrated circuit (ASIC) logic, a graphics processor, a general-purpose processor, etc. In some examples the logic circuit configuration 47 may be implemented via hardware, video coding-specific hardware, etc., and the processor 43 may implement general-purpose software, an operating system, etc. In addition, the memory store 44 may be any type of memory, such as volatile memory (e.g., static random access memory (SRAM), dynamic random access memory (DRAM), etc.) or non-volatile memory (e.g., flash memory, etc.). In non-restrictive examples, the memory store 44 may be implemented by cache memory. In some examples, the logic circuit configuration 47 may access the memory store 44 (for example, for the implementation of an image buffer). In other examples, the logic circuit configuration 47 and / or the processing unit 46 may include a memory store (for example, a cache) for the implementation of an image buffer, etc.

[0122] In some examples, the video encoder 20 implemented via a logic circuit configuration may include an image buffer (for example, by either a processing unit 46 or a memory store 44) and a graphics processing unit (for example, by a processing unit 46). The graphics processing unit may be communicatively coupled to the image buffer. The graphics processing unit may include the video encoder 20, implemented via a logic circuit configuration 47, to embody various modules and / or any other encoder systems or subsystems described herein, such as those described with respect to Figure 2. The logic circuit configuration may be configured to perform various operations as described herein.

[0123] The video decoder 30 may be implemented in a manner similar to that implemented via the logic circuit configuration 47 to embody various modules and / or any other decoder systems or subsystems described herein, such as those described with respect to the decoder 30 in Figure 3. In some examples, the video decoder 30 may be implemented via the logic circuit configuration and may include an image buffer (for example, by either a processing unit 420 or a memory store 44) and a graphics processing unit (for example, by a processing unit 46). The graphics processing unit may be communicatively coupled to the image buffer. The graphics processing unit may include the video decoder 30, which is implemented via the logic circuit configuration 47 to embody various modules and / or any other decoder systems or subsystems described herein, such as those described with respect to Figure 3.

[0124] In some examples, the antenna 42 of the video coding system 40 may be configured to receive an encoded bitstream of video data. As described herein, the encoded bitstream may include data related to encoding video frames as described herein, such as data related to coding segments (e.g., conversion coefficients or quantization conversion coefficients, optional indicators (as described), and / or data defining coding segments), indicators, index values, mode selection data, etc. The video coding system 40 may also include a video decoder 30 coupled to the antenna 42 and configured to decode the encoded bitstream. A display device 45 configured to present video frames.

[0125] For the sake of explanation, embodiments of the present invention are described herein by reference to, for example, High Efficiency Video Coding (HEVC) or Multipurpose Video Coding (VVC), i.e., to next-generation video coding standards developed by the Joint Collaboration Team on Video Coding (JCT-VC) of the ITU-T Video Coding Expert Group (VCEG) and the ISO / IEC Motion Picture Expert Group (MPEG). Those skilled in the art will understand that embodiments of the present invention are not limited to HEVC or VVC.

[0126] Encoder and encoding method Figure 2 shows a schematic block diagram of an exemplary video encoder 20 configured to implement the technique of the present application. In the example of Figure 2, the video encoder 20 comprises an input unit 201 (or input interface 201), a residual calculation unit 204, a transformation unit 206, a quantization unit 208, an inverse quantization unit 210 and an inverse transformation unit 212, a reconstruction unit 214, a loop filter unit 220, a decoded picture buffer (DPB) 230, a mode selection unit 260, an entropy coding unit 270, and an output unit 272 (or output interface 272). The mode selection unit 260 may include an inter-prediction unit 244, an intra-prediction unit 254, and a segmentation unit 262. The inter-prediction unit 244 may include a motion estimation unit and a motion compensation unit (not shown). The video encoder 20 as shown in Figure 2 is sometimes called a hybrid video encoder or a video encoder with a hybrid video codec.

[0127] The residual calculation unit 204, the conversion processing unit 206, the quantization unit 208, and the mode selection unit 260 are sometimes referred to as forming the forward signal path of the encoder 20, while the inverse quantization unit 210, the inverse conversion processing unit 212, the reconstruction unit 214, the buffer 216, the loop filter 220, the decoding picture buffer (DPB) 230, the inter-prediction unit 244, and the intra-prediction unit 254 are sometimes referred to as forming the reverse signal path of the video encoder 20, where the reverse signal path of the video encoder 20 corresponds to the signal path of the decoder (see video decoder 30 in Figure 3). The inverse quantization unit 210, the inverse conversion processing unit 212, the reconstruction unit 214, the loop filter 220, the decoding picture buffer (DPB) 230, the inter-prediction unit 244, and the intra-prediction unit 254 are also referred to as forming the "built-in decoder" of the video encoder 20.

[0128] Pictures and picture sections (pictures and blocks) The encoder 20 may be configured to receive a picture 17 (or picture data 17), for example, a picture of a sequence of pictures that make up a video or video sequence, for example, via the input unit 201. The received picture or picture data may also be a pre-processed picture 19 (or pre-processed picture data 19). For simplicity, the following description refers to picture 17. Picture 17 is sometimes called the current picture, or the picture to be coded (particularly in video coding, to distinguish the current picture from other pictures of the same video sequence, i.e., a video sequence that also has a current picture, for example, pictures that have been previously coded and / or decoded).

[0129] A (digital) picture is, or can be considered, a two-dimensional array or matrix of samples having intensity values. A sample in an array is sometimes called a pixel (a short form of picture element) or pel. The number of samples in the horizontal and vertical directions (i.e., axes) of the array or picture determines the size and / or resolution of the picture. For color representation, typically three color components are employed; that is, a picture may represent, or contain, three sample arrays. In the RGB format or color space, a picture has corresponding red, green, and blue sample arrays. However, in video coding, each pixel is represented in a luminance and chrominance format or color space, e.g., YCbCr, with a luminance component typically represented by Y (sometimes L is used instead), and two chrominance components represented by Cb and Cr. The luminance (or short luma) component Y represents brightness or gray level intensity (for example, in a grayscale picture), and the two chrominance (or short chroma) components Cb and Cr represent chromaticity or color information components. Thus, a picture in YCbCr format has a luminance sample array of luminance sample values (Y) and two chrominance sample arrays of chrominance values (Cb and Cr). A picture in RGB format may be converted to or transformed to YCbCr format, and vice versa; this process is also called color transformation or color conversion. If a picture is monochrome, it may only have a luminance sample array. Thus, a picture may be, for example, an array of luma samples in a monochrome format, or an array of luma samples and two corresponding arrays of chroma samples in 4:2:0, 4:2:2, and 4:4:4 color formats.

[0130] Embodiments of the video encoder 20 may include a picture partitioning unit (not shown in Figure 2) configured to divide a picture 17 into multiple (typically non-overlapping) picture blocks 203. These blocks may also be called root blocks, macroblocks (H.264 / AVC), or coding tree blocks (CTB) or coding tree units (CTU) (H.265 / HEVC and VVC). The picture partitioning unit may be configured to use the same block size for all pictures in a video sequence and for the corresponding grid that defines the block size, or to change the block size between pictures or subsets or groups of pictures to partition each picture into a corresponding block.

[0131] In a further embodiment, the video encoder may be configured to directly receive blocks 203 of picture 17, for example, one, some, or all of the blocks that make up picture 17. Picture blocks 203 are sometimes referred to as picture blocks, or picture blocks to be coded.

[0132] Like picture 17, picture block 203 is again a two-dimensional array or matrix of samples having intensity values (sample values), but smaller in dimensions than picture 17, or can be considered as such. In other words, block 203 may comprise, for example, one sample array (e.g., a lumen array in the case of monochrome picture 17, or a lumen or chromen array in the case of a color picture), or three sample arrays (e.g., a lumen and two chromen arrays in the case of color picture 17), or any other number and / or type of arrays depending on the color format applied. The number of samples in the horizontal and vertical directions (i.e., axes) of block 203 defines the size of block 203. Thus, a block may be, for example, a sample of an M×N (M columns × N rows) array or a conversion factor of an M×N array.

[0133] An embodiment of the video encoder 20, as shown in Figure 2, may be configured to encode the picture 17 in blocks, for example, encoding and prediction being performed for each block 203.

[0134] Residual calculation The residual calculation unit 204 may be configured to calculate the residual block 205 (also called residual 205) based on picture block 203 and prediction block 265 (further details about prediction block 265 will be provided later) by subtracting the sample values of prediction block 265 from the sample values of picture block 203 in sample units (in pixel units), thereby obtaining the residual block 205 in the sample region.

[0135] conversion The transformation processing unit 206 may be configured to apply a transformation, such as a discrete cosine transform (DCT) or discrete sine transform (DST), to the sample values of the residual block 205 to obtain transformation coefficients 207 in the transformation domain. The transformation coefficients 207 are sometimes called transformation residual coefficients and represent the residual block 205 in the transformation domain.

[0136] The conversion processing unit 206 may be configured to apply an integer approximation of DCT / DST, such as a specified conversion for H.265 / HEVC. Compared to an orthogonal DCT conversion, such an integer approximation is typically scaled by several coefficients. Additional scaling coefficients are applied as part of the conversion process to maintain the norm of the residual blocks processed by the forward and inverse conversions. The scaling coefficients are typically chosen based on several constraints, such as the scaling coefficient being a power of 2 for the shift operation, the bit depth of the conversion coefficients, and a trade-off between accuracy and implementation cost. A particular scaling coefficient may be specified, for example, by the inverse conversion processing unit 212, for example, for the inverse conversion (and, for example, by the inverse conversion processing unit 312 in the video decoder 30), and a corresponding scaling coefficient for the forward conversion in the encoder 20, for example, by the conversion processing unit 206, may be specified accordingly.

[0137] Embodiments of the video encoder 20 (each a conversion processing unit 206) may be configured to output conversion parameters, for example, one or more types of conversions, encoded or compressed directly or via the entropy coding unit 270, so that the video decoder 30 can receive and use the conversion parameters for decoding.

[0138] quantization The quantization unit 208 may be configured to quantize the transformation coefficients 207 to obtain the quantization coefficients 209, for example, by applying scalar quantization or vector quantization. The quantization coefficients 209 are sometimes called the quantization transformation coefficients 209 or the quantization residual coefficients 209.

[0139] The quantization process can reduce the bit depth associated with some or all of the conversion coefficients 207. For example, n-bit conversion coefficients may be truncated to m-bit conversion coefficients during quantization, where n is greater than m. The degree of quantization can be modified by adjusting the quantization parameter (QP). For example, in the case of scalar quantization, different scaling may be applied to achieve finer or coarser quantization. Smaller quantization step sizes correspond to finer quantization, and larger quantization step sizes correspond to coarser quantization. Applicable quantization step sizes may be indicated by the quantization parameter (QP). The quantization parameter may be, for example, an index to a default set of applicable quantization step sizes. For example, a small quantization parameter may correspond to finer quantization (smaller quantization step size), a large quantization parameter may correspond to coarser quantization (larger quantization step size), or vice versa. Quantization may involve division by the quantization step size, and for example, the corresponding and / or inverse inverse quantization by the inverse quantization unit 210 may involve multiplication by the quantization step size. Some standards, e.g., embodiments by HEVC, may be configured to use quantization parameters to determine the quantization step size. Generally, the quantization step size may be calculated based on the quantization parameters using fixed-point approximations of the equations involving division. Additional scaling factors may be introduced for quantization and inverse quantization to restore the norm of the residual block, which may be modified by the scaling used in the fixed-point approximations of the equations for the quantization step size and quantization parameters. In one exemplary implementation, the scaling of the inverse transform and inverse quantization may be combined. Alternatively, a customized quantization table may be used, for example, signaled from encoder to decoder in a bitstream. Quantization is an irreversible operation, and the loss increases with increasing quantization step size.

[0140] Embodiments of the video encoder 20 (each a quantization unit 208) may be configured to output quantization parameters (QP) that are encoded, for example, directly or via an entropy coding unit 270, so that a video decoder 30 can receive and apply the quantization parameters for decoding.

[0141] inverse quantization The inverse quantization unit 210 is configured to obtain inverse quantization coefficients 211 by applying the inverse of the quantization scheme applied by the quantization unit 208 to the quantization coefficients, for example, based on or using the same quantization step size as the quantization unit 208. The inverse quantization coefficients 211 are sometimes called inverse quantization residual coefficients 211 and correspond to the transformation coefficients 207, although they are usually not identical to the transformation coefficients due to losses due to quantization.

[0142] Inverse Transform The inverse transformation processing unit 212 is configured to apply the inverse transformation of the transformation applied by the transformation processing unit 206, for example, the inverse discrete cosine transform (DCT) or the inverse discrete sine transform (DST) or other inverse transformations, to obtain the reconstructed residual block 213 (or the corresponding inverse quantization coefficient 213) in the sample region. The reconstructed residual block 213 is sometimes referred to as the transformation block 213.

[0143] Reconstruction The reconstruction unit 214 (for example, an adder or summerer 214) is configured to add the transformed block 213 (i.e., the reconstructed residual block 213) to the predicted block 265 by adding the sample values of the reconstructed residual block 213 and the sample values of the predicted block 265 on a sample-by-sample basis, thereby obtaining the reconstructed block 215 in the sample region.

[0144] Filtering The loop filter unit 220 (or the short “loop filter” 220) is configured to filter the reconstructed block 215 to obtain a filtered block 221, or generally to filter the reconstructed samples to obtain filtered samples. The loop filter unit is configured, for example, to smooth pixel transitions or to improve video quality in other ways. The loop filter unit 220 may comprise one or more loop filters, such as a deblocking filter, a sample-adaptive offset (SAO) filter, or one or more other filters, such as a bidirectional filter, an adaptive loop filter (ALF), a sharpening filter, a smoothing filter, or a co-filter, or any combination thereof. The loop filter unit 220 is shown in Figure 2 as an in-loop filter, but in other configurations, the loop filter unit 220 may be implemented as a post-loop filter. The filtered block 221 is sometimes referred to as the filtered reconstructed block 221. The decoded picture buffer 230 can store the reconstructed coding block after the loop filter unit 220 has performed a filtering operation on the reconstructed coding block.

[0145] Embodiments of the video encoder 20 (each a loop filter unit 220) may be configured to output loop filter parameters (such as sample-adaptive offset information), for example, directly or encoded via the entropy coding unit 270, so that the decoder 30 can receive and apply the same loop filter parameters or the respective loop filters for decoding.

[0146] Decode picture buffer The decoded picture buffer (DPB) 230 may be a memory that stores a reference picture, or more generally, reference picture data, for encoding video data by the video encoder 20. The DPB 230 may be formed from any of a variety of memory devices, such as dynamic random access memory (DRAM), including synchronous DRAM (SDRAM), magnetoresistive RAM (MRAM), resistive RAM (RRAM®), or other types of memory devices. The decoded picture buffer (DPB) 230 may be configured to store one or more filtered blocks 221. The decoded picture buffer 230 may be further configured to store other previously filtered blocks, e.g., blocks 221 of the same current picture, or of different pictures, e.g., blocks 221 of a previously reconstructed picture, e.g., blocks 221 of a previously reconstructed picture, e.g., for interpretation, it may provide a complete, previously reconstructed, i.e., decoded picture (and its corresponding reference block and sample) and / or a partially reconstructed current picture (and its corresponding reference block and sample). The decoded picture buffer (DPB) 230 may also be configured to store, for example, one or more unfiltered reconstructed blocks 215, or generally, unfiltered reconstructed samples, or any other further processed versions of a reconstructed block or sample, if the reconstructed block 215 has not been filtered by the loop filter unit 220.

[0147] Mode selection (category and prediction) The mode selection unit 260 comprises a segmentation unit 262, an inter-prediction unit 244, and an intra-prediction unit 254, and is configured to receive or acquire filtered and / or unfiltered reconstructed samples or blocks of original picture data, e.g., original block 203 (current block 203 of current picture 17), and reconstructed picture data, e.g., from one or more previously decoded pictures of the same (current) picture, e.g., from a decoded picture buffer 230 or other buffer (e.g., a line buffer, not shown). The reconstructed picture data is used as reference picture data for predictions, e.g., inter-prediction or intra-prediction, to acquire prediction blocks 265 or predictors 265.

[0148] The mode selection unit 260 may be configured to determine or select a classification for the current block prediction mode (including no classification) and a prediction mode (e.g., intra-prediction mode or inter-prediction mode), and to generate a corresponding prediction block 265, which is used for calculating the residual block 205 and for reconstructing the reconstruction block 215.

[0149] Embodiments of the mode selection unit 260 may be configured to select partition and predictive modes (for example, from those supported by or available to the mode selection unit 260) that provide the best fit or in other words, minimum residual (minimum residual meaning better compression for transmission or storage), or minimum signaling overhead (minimum signaling overhead meaning better compression for transmission or storage), or consider or balance both. The mode selection unit 260 may be configured to determine the partition and predictive modes based on rate distortion optimization (RDO), i.e., to select a predictive mode that yields minimum rate distortion. In this context, terms such as “best,” “minimum,” and “optimal” do not necessarily refer to the overall “best,” “minimum,” and “optimal,” but may also refer to achieving termination criteria or selection criteria such as values that are above or below thresholds or other constraints that lead to a potentially “suboptimal choice” but reduce complexity and processing time.

[0150] In other words, the partitioning unit 262 may be configured to partition block 203 into smaller block partitions or subblocks (which then form blocks) by iteratively using, for example, quad-tree partitioning (QT), binary partitioning (BT), or triple-tree partitioning (TT), or any combination thereof, and to perform predictions for each of the block partitions or subblocks, where the mode selection comprises a selection of the tree structure of the block 203 to be partitioned, and the prediction mode is applied to each of the block partitions or subblocks.

[0151] The following describes in more detail the segmentation (by the segmentation unit 260, for example) and prediction (by the inter-prediction unit 244 and the intra-prediction unit 254) processes performed by the exemplary video encoder 20.

[0152] classification The partitioning unit 262 can now partition (i.e., divide) block 203 into smaller partitions, for example, smaller blocks of square or rectangular size. These smaller blocks (sometimes called subblocks) can then be further partitioned into even smaller partitions. This is also called tree partitioning or hierarchical tree partitioning. For example, the root block at root tree level 0 (hierarchy level 0, depth 0) may be recursively partitioned, for example, into two or more blocks at the next lowest tree level, for example, into nodes at tree level 1 (hierarchy level 1, depth 1). These blocks may then be further partitioned into two or more blocks at the next lowest level, for example, tree level 2 (hierarchy level 2, depth 2), until a termination criterion is met, for example, to reach the maximum tree depth or minimum block size, and the partitioning ends. Blocks that are not further partitioned are also called leaf blocks or leaf nodes of the tree. A tree that uses a division into two parts is called a binary tree (BT), a tree that uses a division into three parts is called a ternary tree (TT), and a tree that uses a division into four parts is called a quad tree (QT).

[0153] As previously stated, the term “block” as used herein may refer to a portion of a picture, particularly a square or rectangular portion. For example, with reference to HEVC and VVC, a block may be, or equivalent to, a coding tree unit (CTU), a coding unit (CU), a prediction unit (PU), and a transform unit (TU), as well as the corresponding blocks, such as a coding tree block (CTB), a coding block (CB), a transform block (TB), or a prediction block (PB).

[0154] For example, a coding tree unit (CTU) may be, or comprise, a CTB for a lumen sample, two corresponding CTBs for a chroma sample, or a CTB for a sample of a monochrome picture or a picture coded using three distinct color planes, and a syntax structure used to code the sample. Correspondingly, a coding tree block (CTB) may be an N×N block of sample for some value of N such that the division of components into the CTB is a partition. A coding unit (CU) may be, or comprise, a coding block for a lumen sample, two corresponding coding blocks for a chroma sample, or a coding block for a sample of a monochrome picture or a picture coded using three distinct color planes, and a syntax structure used to code the sample. Correspondingly, a coding block (CB) may be an M×N block of sample for some value of M and N such that the division of the CTB into the coding block is a partition.

[0155] For example, in an embodiment using HEVC, a coding tree unit (CTU) may be divided into CUs by using a quadtree structure, which is shown as a coding tree. The decision of whether to code a picture area using interpicture (time) prediction or intrapicture (spatial) prediction is made at the CU level. Each CU may be further divided into one, two, or four PUs according to the PU division type. Within each PU, the same prediction process is applied, and the relevant information is sent to the decoder for each PU. After obtaining residual blocks by applying the prediction process based on the PU division type, the CU may be divided into transformation units (TUs) according to another quadtree structure, similar to the coding tree for the CU.

[0156] For example, in embodiments of the latest video coding standard currently under development called Multipurpose Video Coding (VVC), quad-tree and binary tree (QTBT) partitions are used to partition coding blocks. In a QTBT block structure, CUs can have either a square or rectangular shape. For example, a coding tree unit (CTU) is initially partitioned by a quad-tree structure. The quad-tree leaf nodes are further partitioned by a binary or ternary (or ternary) tree structure. The partitioned tree leaf nodes are called coding units (CUs), and their segmentation is used for prediction and transformation processing without further partitioning. This means that CUs, PUs, and TUs have the same block size in the QTBT coding block structure. In parallel, it has been proposed that multiple partitions, such as ternary tree partitions, also be used in conjunction with the QTBT block structure.

[0157] In one example, the mode selection unit 260 of the video encoder 20 may be configured to perform any combination of the segmentation techniques described herein.

[0158] As described above, the video encoder 20 is configured to determine or select the best or most optimal prediction mode from a set of (predetermined) prediction modes. The set of prediction modes may include, for example, an intra-prediction mode and / or an inter-prediction mode.

[0159] Intra Prediction The set of intra-prediction modes may comprise 35 different intra-prediction modes, such as DC (or average) mode and non-directional modes like planar mode, or directional modes, such as those defined in HEVC, or 67 different intra-prediction modes, such as DC (or average) mode and non-directional modes like planar mode, or directional modes, such as those defined for VVC.

[0160] The intra-prediction unit 254 is configured to generate an intra-prediction block 265 according to the intra-prediction mode of a set of intra-prediction modes, using a reconfigured sample of adjacent blocks of the same current picture.

[0161] The intra-prediction unit 254 (or generally, the mode selection unit 260) is further configured to output intra-prediction parameters (or generally, information indicating a selected intra-prediction mode for a block) to the entropy coding unit 270 in the form of syntax elements 266 to be included in the coded picture data 21, for example, so that the video decoder 30 can receive and use the prediction parameters for decoding.

[0162] Interpretation The set of interpretation modes (or possible interpretation modes) depends on the available reference picture (i.e., a previously at least partially decoded picture stored in DBP230, for example) and other interpretation parameters, such as whether the entire reference picture is used to search for the best-matching reference block, or only a portion of the reference picture, such as the search window area around the current block area, and / or whether pixel interpolation, such as half-per / semi-per interpolation and / or quarter-per interpolation, is applied.

[0163] In addition to the prediction modes described above, skip mode and / or direct mode may be applied.

[0164] The interpretation unit 244 may include a motion estimation (ME) unit and a motion compensation (MC) unit (neither of which are shown in Figure 2). The motion estimation unit may be configured to receive or acquire, for motion estimation, a picture block 203 (the current picture block 203 of the current picture 17) and a decoded picture 231, or at least one or more previously reconstructed blocks, for example, one or more reconstructed blocks of one or more previously decoded other / different pictures 231. For example, a video sequence may comprise a current picture and a previously decoded picture 231, or in other words, the current picture and the previously decoded picture 231 may be part of a sequence of pictures that make up the video sequence, or may make up the sequence.

[0165] The encoder 20 may be configured, for example, to select a reference block from multiple reference blocks of the same or different pictures among several other pictures, and to provide the motion estimation unit with the reference picture (or reference picture index) and / or the offset (spatial offset) between the position (x, y coordinates) of the reference block and the position of the current block as interpretation parameters. This offset is also called the motion vector (MV).

[0166] The motion compensation unit is configured to acquire, for example, interprediction parameters and perform interprediction to acquire interprediction block 265 based on or using the interprediction parameters. The motion compensation performed by the motion compensation unit may involve fetching or generating prediction blocks based on the motion / block vector determined by the motion estimation, performing interpolation to sub-pixel precision as much as possible. Interpolation filtering may generate additional pixel samples from known pixel samples, thus potentially increasing the number of candidate prediction blocks that can be used to code picture blocks. Now receiving the motion vector for the picture block PU, the motion compensation unit may locate the position of the prediction block pointed to by the motion vector in one of the reference picture lists.

[0167] The motion compensation unit may also generate syntax elements related to blocks and video slices for use by the video decoder 30 when decoding picture blocks of video slices.

[0168] Entropy coding The entropy coding unit 270 is configured to obtain coded picture data 21 by applying, or bypassing (without compression), an entropy coding algorithm or entropy coding scheme (e.g., variable-length coding (VLC) scheme, context-adaptive VLC scheme (CAVLC), arithmetic coding scheme, binarization, context-adaptive binary arithmetic coding (CABAC), syntax-based context-adaptive binary arithmetic coding (SBAC), probability interval partitioned entropy (PIPE) coding, or another entropy coding method or technique) to the quantization coefficients 209, inter-prediction parameters, intra-prediction parameters, loop filter parameters, and / or other syntax elements, and the coded picture data 21 may be output via the output unit 272, for example, in the form of coded bitstream 21, so that the video decoder 30 can receive and use the parameters for decoding. The coded bitstream 21 may be transmitted to the video decoder 30 or stored in memory so that it can be transmitted or retrieved later by the video decoder 30.

[0169] Other structural variations of the video encoder 20 may be used to encode the video stream. For example, a non-conversion-based encoder 20 can directly quantize the residual signal for some blocks or frames without using the conversion processing unit 206. In another implementation, the encoder 20 can combine the quantization unit 208 and the inverse quantization unit 210 within a single unit.

[0170] Decoder and decoding method Figure 3 shows an example of a video decoder 30 configured to carry out the technique of this invention. The video decoder 30 is configured to receive encoded picture data 21 (e.g., encoded bitstream 21) encoded by the encoder 20, for example, and to obtain a decoded picture 331. The encoded picture data or bitstream comprises data representing picture blocks of an encoded video slice, and information for decoding associated syntax elements.

[0171] In the example in Figure 3, the decoder 30 comprises an entropy decoding unit 304, an inverse quantization unit 310, an inverse transformation processing unit 312, a reconstruction unit 314 (e.g., an adder 314), a loop filter 320, a decoded picture buffer (DBP) 330, an interpretation unit 344, and an intraprediction unit 354. The interpretation unit 344 may be or may include a motion compensation unit. In some examples, the video decoder 30 may perform a decoding path that is generally the opposite of the encoding path described with respect to the video encoder 100 from Figure 2.

[0172] As described with respect to encoder 20, the inverse quantization unit 210, inverse processing unit 212, reconstruction unit 214, loop filter 220, decoding picture buffer (DPB) 230, inter-prediction unit 344, and intra-prediction unit 354 are also referred to as forming the “built-in decoder” of video encoder 20. Thus, the inverse quantization unit 310 may have the same function as the inverse quantization unit 110, the inverse processing unit 312 may have the same function as the inverse processing unit 212, the reconstruction unit 314 may have the same function as the reconstruction unit 214, the loop filter 320 may have the same function as the loop filter 220, and the decoding picture buffer 330 may have the same function as the decoding picture buffer 230. Accordingly, the descriptions provided for each unit and function of video encoder 20 apply correspondingly to each unit and function of video decoder 30.

[0173] Entropy decoding The entropy decoding unit 304 is configured to parse the bitstream 21 (or generally, the encoded picture data 21) and, for example, perform entropy decoding to the encoded picture data 21 to obtain, for example, quantization coefficients 309 and / or decoded coding parameters (not shown in Figure 3), such as inter-prediction parameters (e.g., reference picture index and motion vector), intra-prediction parameters (e.g., intra-prediction mode or index), transformation parameters, quantization parameters, loop filter parameters, and / or other syntax elements. The entropy decoding unit 304 may be configured to apply a decoding algorithm or decoding scheme corresponding to an encoding scheme such as those described with respect to the entropy coding unit 270 of the encoder 20. The entropy decoding unit 304 may be further configured to provide the inter-prediction parameters, intra-prediction parameters, and / or other syntax elements to the mode selection unit 360, and the other parameters to other units of the decoder 30. The video decoder 30 may receive syntax elements at the video slice level and / or video block level.

[0174] inverse quantization The inverse quantization unit 310 may be configured to receive quantization parameters (QP) (or generally, information related to inverse quantization) and quantization coefficients from the encoded picture data 21 (for example, by parsing and / or decoding by the entropy decoding unit 304), and to apply inverse quantization to the decoded quantization coefficients 309 based on the quantization parameters to obtain inverse quantization coefficients 311, sometimes called transformation coefficients 311. The inverse quantization process may include the use of quantization parameters determined by the video encoder 20 for each video block in the video slice to determine the degree of quantization, and similarly, the degree of inverse quantization to be applied.

[0175] Inverse Transform The inverse transformation processing unit 312 may be configured to receive the inverse quantization coefficients 311, also called transformation coefficients 311, in order to obtain the reconstructed residual block 213 in the sample region, and to apply a transformation to the inverse quantization coefficients 311. The reconstructed residual block 213 may also be called the transformation block 313. The transformation may be an inverse transformation, such as an inverse DCT transformation, an inverse DST transformation, an inverse integer transformation, or a conceptually similar inverse transformation process. The inverse transformation processing unit 312 may be further configured to receive transformation parameters or corresponding information from the encoded picture data 21 (for example, by parsing and / or decoding by the entropy decoding unit 304) to determine the transformation to be applied to the inverse quantization coefficients 311.

[0176] Reconstruction The reconstruction unit 314 (for example, an adder or summerer 314) may be configured to add the reconstructed residual block 313 to the predicted block 365 by adding the sample values of the reconstructed residual block 313 to the sample values of the predicted block 365, for example, to obtain the reconstructed block 315 in the sample region.

[0177] Filtering The loop filter unit 320 (either within or after the coding loop) is configured to filter the reconstructed block 315 to obtain the filtered block 321, for example, to smooth pixel transitions or otherwise improve video quality. The loop filter unit 320 may comprise one or more loop filters, such as a deblocking filter, a sample-adaptive offset (SAO) filter, or one or more other filters, such as a bidirectional filter, an adaptive loop filter (ALF), a sharpening filter, a smoothing filter, or a co-filter, or any combination thereof. The loop filter unit 320 is shown in Figure 3 as an in-loop filter, but in other configurations, the loop filter unit 320 may be implemented as a post-loop filter.

[0178] Decode picture buffer The decoded video block 321 of the picture is then stored in the decoded picture buffer 330, which stores the decoded picture 331 as a reference picture for subsequent motion compensation for other pictures and / or for outputting a display, respectively.

[0179] The decoder 30 is configured to output the decoded picture 311, for example, via the output unit 312, for presentation or viewing to the user.

[0180] prediction The inter-prediction unit 344 may be identical to the inter-prediction unit 244 (specifically, the motion compensation unit), and the intra-prediction unit 354 may have the same function as the inter-prediction unit 254, and performs division or division determination and prediction based on division and / or prediction parameters, or each piece of information received from the encoded picture data 21 (for example, by parsing and / or decoding by the entropy decoding unit 304). The mode selection unit 360 may be configured to obtain a predicted block 365 by performing a prediction (intra-prediction or inter-prediction) block by block based on the reconstructed picture, block, or each sample (filtered or unfiltered).

[0181] When a video slice is coded as an intra-coded (I) slice, the intra-prediction unit 354 of the mode selection unit 360 is configured to generate a prediction block 365 for the picture block of the current video slice based on the signaled intra-prediction mode and data from previously decoded blocks of the current picture. When a video picture is coded as an inter-coded (i.e., B or P) slice, the inter-prediction unit 344 (e.g., motion compensation unit) of the mode selection unit 360 is configured to produce a prediction block 365 for the video block of the current video slice based on motion vectors and other syntax elements received from the entropy decoding unit 304. In the case of inter-prediction, the prediction block may be produced from one of the reference pictures in one of the reference picture lists. The video decoder 30 may construct reference frame lists, i.e., list 0 and list 1, using default construction techniques based on the reference pictures stored in the DPB 330.

[0182] The mode selection unit 360 is configured to determine prediction information for video blocks in the current video slice by parsing motion vectors and other syntax elements, and to use the prediction information to create prediction blocks for the current video block being decoded. For example, the mode selection unit 360 uses some of the received syntax elements to determine the prediction mode used to code the video blocks in the video slice (e.g., intra-prediction or inter-prediction), the inter-prediction slice type (e.g., B-slice, P-slice, or GPB-slice), configuration information for one or more of the reference picture lists for the slice, the motion vector for each inter-coded video block in the slice, the inter-prediction status for each inter-coded video block in the slice, and other information for decoding the video blocks in the current video slice.

[0183] Other variations of the video decoder 30 may be used to decode the encoded picture data 21. For example, the decoder 30 can produce an output video stream without using the loop filtering unit 320. For example, a non-transformation-based decoder 30 can directly dequantize the residual signal for some blocks or frames without using the inverse transformation unit 312. In another implementation, the video decoder 30 can combine the inverse quantization unit 310 and the inverse transformation unit 312 into a single unit.

[0184] Figure 4 is a schematic diagram of a video coding device 400 according to one embodiment of the present disclosure. The video coding device 400 is suitable for carrying out the embodiments disclosed as described herein. In one embodiment, the video coding device 400 may be a decoder such as the video decoder 30 in Figure 1A or an encoder such as the video encoder 20 in Figure 1A.

[0185] The video coding device 400 comprises an inlet port 410 (or input port 410) and a receiver unit (Rx) 420 for receiving data, a processor, logic unit, or central processing unit (CPU) 430 for processing data, a transmitter unit (Tx) 440 and an exit port 450 (or output port 450) for transmitting data, and memory 460 for storing data. The video coding device 400 may also include optical-to-electrical (OE) and electrical-to-optical (EO) components coupled to the inlet port 410, receiver unit 420, transmitter unit 440, and exit port 450 for the exit or input of optical or electrical signals.

[0186] The processor 430 is implemented by hardware and software. The processor 430 may be implemented as one or more CPU chips, cores (e.g., as a multi-core processor), FPGAs, ASICs, and DSPs. The processor 430 communicates with an inlet port 410, a receiver unit 420, a transmitter unit 440, an exit port 450, and memory 460. The processor 430 includes a coding module 470. The coding module 470 implements the embodiments disclosed above. For example, the coding module 470 performs, processes, prepares, or provides various coding operations. Thus, including the coding module 470 results in a significant improvement in the functionality of the video coding device 400 and affects the conversion of the video coding device 400 to different states. Alternatively, the coding module 470 is implemented as instructions, stored in memory 460 and executed by the processor 430.

[0187] Memory 460 may comprise one or more disks, tape drives, and solid-state drives and may be used as an overflow data storage device for storing a program when such a program is selected for execution, and for storing instructions and data read during program execution. Memory 460 may be, for example, volatile and / or non-volatile, and may be read-only memory (ROM), random access memory (RAM), ternary content-addressable memory (TCAM), and / or static random access memory (SRAM).

[0188] Figure 5 is a simplified block diagram of a device 500 that can be used as one or both of the source device 12 and destination device 14 from Figure 1, according to an exemplary embodiment. The device 500 can implement the techniques of this application described above. The device 500 can take the form of a computing system comprising multiple computing devices, or the form of a single computing device, such as a mobile phone, tablet computer, laptop computer, notebook computer, or desktop computer.

[0189] The processor 502 in the device 500 may be a central processing unit. Alternatively, the processor 502 may be any other type of device, existing or to be developed in the future, capable of manipulating or processing information. The disclosed implementation may be carried out using a single processor, e.g., processor 502, as shown in the figure, but advantages in speed and efficiency may be achieved using two or more processors.

[0190] The memory 504 in the device 500 may, in one implementation, be a read-only memory (ROM) device or a random access memory (RAM) device. Any other suitable type of storage device may be used as memory 504. Memory 504 may contain code and data 506 accessed by the processor 502 using the bus 512. Memory 504 may further contain an operating system 508 and an application program 510, the application program 510 including at least one program that enables the processor 502 to perform the method described herein. For example, the application program 510 may include applications 1 to N, the applications 1 to N further including a video coding application that performs the method described herein. The device 500 may also include additional memory in the form of secondary storage 514, which may be a memory card used with a mobile computing device, for example. Since a video communication session may contain a considerable amount of information, it may be stored entirely or partially in the secondary storage 514 and loaded into memory 504 as needed for processing.

[0191] The device 500 may also include one or more output devices, such as a display 518. The display 518 may, in one example, be a touch-sensitive display, which combines the display with a touch-sensitive element capable of operating to sense touch input. The display 518 may be coupled to the processor 502 via a bus 512. In addition to the display 518, or as an alternative to the display 518, other output devices may be provided that enable a user to program or otherwise use the device 500. When the output device is a display or includes a display, the display may be implemented in a variety of ways, including by a liquid crystal display (LCD), a cathode ray tube (CRT) display, a plasma display, or a light-emitting diode (LED) display such as an organic LED (OLED) display.

[0192] The device 500 may also include, or communicate with, any other existing or future image sensing device 520 capable of sensing images, such as a camera or an image of a user operating the device 500. The image sensing device 520 may be positioned to face the user operating the device 500. In one example, the position and optical axis of the image sensing device 520 may be configured such that it is directly adjacent to the display 518 and its field of view includes the area from which the display 518 is visible.

[0193] The device 500 may also include, or communicate with, a sound sensing device 522, for example, a microphone, or any other existing or future sound sensing device capable of sensing sound near the device 500. The sound sensing device 522 may be positioned to face a user operating the device 500 and may be configured to receive sound, for example, voice or other utterances made by the user while the user is operating the device 500.

[0194] Figure 5 shows the processor 502 and memory 504 of device 500 as integrated into a single unit, but other configurations may be available. The operation of processor 502 may be distributed across multiple machines (each machine having one or more processors) that can be connected directly or over a local area network or other network. Memory 504 may be distributed across multiple machines, such as network-based memory or memory in multiple machines running the operation of device 500. Although shown here as a single bus, the bus 512 of device 500 may consist of multiple buses. Furthermore, secondary storage 514 may be directly connected to other components of device 500 or accessed over a network, and may comprise a single integrated unit such as a memory card or multiple units such as multiple memory cards. Device 500 can thus be implemented in a wide variety of configurations.

[0195] Definitions and glossary of acronyms JEM Collaborative Exploration Model (Software codebase for future video coding exploration) JVET Joint Video Expert Team LUT Look-Up Table QT quadtree QTBT Quadrant tree + Binary tree RDO rate distortion optimization ROM (Read-Only Memory) VTM VVC Test Model VVC (Versatile Video Coding), i.e., a standardization project developed by JVET. CTU / CTB Coding Tree Unit / Coding Tree Block CU / CB Coding Unit / Coding Block PU / PB Prediction Unit / Prediction Block TU / TB Conversion Unit / Conversion Block HEVC High Efficiency Video Coding

[0196] Video coding schemes such as H.264 / AVC and HEVC are designed according to the principle of good results of block-based hybrid video coding. Using this principle, the picture is first divided into blocks, and then each block is predicted by using intra-picture prediction or inter-picture prediction.

[0197] Several video coding standards since H.261 belong to the group of "irreversible hybrid video codecs" (i.e., combining spatial and temporal prediction in the sample domain with 2D transform coding to apply quantization in the transform domain). Each picture in a video sequence is typically divided into a set of non-overlapping blocks, and coding is typically performed at the block level. In other words, in the encoder, video is typically processed, i.e., encoded at the block (picture block) level to generate predicted blocks, usually by using, for example, spatial (intra-picture) and temporal (inter-picture) predictions; the predicted blocks are subtracted from the current blocks (the blocks currently being processed / to be processed) to obtain residual blocks; the residual blocks are transformed and quantized in the transform domain to reduce (compress) the amount of data to be transmitted; and in the decoder, the reverse process compared to the encoder is partially applied to the encoded or compressed blocks to reconstruct the current blocks for depiction. Furthermore, the encoder replicates the decoder processing loop such that both generate identical predictions (e.g., intra-prediction and inter-prediction) and / or reconstructions for processing, i.e., coding, the subsequent block.

[0198] As used herein, the term “block” may refer to a portion of a picture or frame. For convenience of explanation, embodiments of the present invention are described herein with respect to reference software for High Efficiency Video Coding (HEVC) or Versatile Video Coding (VVC), developed by the Joint Video Coding Working Group (JCT-VC) of the ITU-T Video Coding Expert Group (VCEG) and the ISO / IEC Motion Picture Expert Group (MPEG). Those skilled in the art will understand that embodiments of the present invention are not limited to HEVC or VVC. They may refer to CUs, PUs, and TUs. In HEVC, a CTU is divided into CUs by using a quadtree structure shown as a coding tree. The decision of whether to code a picture area using interpicture (time) prediction or intrapicture (spatial) prediction is made at the CU level. Each CU may be further divided into one, two, or four PUs according to the PU division type. Within a single PU, the same prediction process is applied, and the relevant information is sent to the decoder for each PU. After obtaining residual blocks by applying a prediction process based on the PU partitioning type, the CU can be partitioned into transformation units (TUs) according to another quadtree structure similar to the coding tree for the CU. In the most recent developments of video compression technology, quadtree and binary (QTBT) partitioning is used to partition coding blocks. In the QTBT block structure, the CU can have either a square or rectangular shape. For example, a coding tree unit (CTU) is first partitioned by a quadtree structure. The quadtree leaf nodes are further partitioned by a binary tree structure. The binary tree leaf nodes are called coding units (CUs), and their segmentation is used for prediction and transformation processing without further partitioning. This means that CUs, PUs, and TUs have the same block size in the QTBT coding block structure. In parallel, it has also been proposed that compound partitioning, such as ternary partitioning, be used in conjunction with the QTBT block structure.

[0199] The ITU-T VCEG (Q6 / 16) and ISO / IEC MPEG (JTC 1 / SC29 / WG11) are exploring the potential need for standardization of future video coding technologies that offer significantly greater compression capabilities than the current HEVC standard (including its current and upcoming extensions for screen content coding and high dynamic range coding). The groups are collaborating in this exploration activity within a joint research initiative called the Joint Video Exploration Team (JVET) to evaluate compression technology designs proposed by their experts in this area.

[0200] For directional intra-prediction, different prediction angles are available, representing different prediction angles from the top of the diagonal to the bottom of the diagonal. An offset value pang is defined on a 32-sample grid for defining the prediction angle. p to the corresponding intra-prediction mode ang The association is visualized in Figure 6 for the vertical prediction mode. In the horizontal prediction mode, the method is flipped vertically, and accordingly p ang A value is assigned. As mentioned above, all angle prediction modes are available for all applicable intra-prediction block sizes. They all use the same 32-sample grid for defining the predicted angle. p across the 32-sample grid in Figure 6 ang The distribution of values reveals an increased resolution of the predicted angle around the vertical axis and a coarser resolution towards the diagonal axis. The same applies to the horizontal axis. This design stems from the observation that in much video content, nearly horizontal and vertical structures play a more significant role compared to diagonal structures.

[0201] For horizontal and vertical prediction directions, selecting the samples to be used for prediction is straightforward, but for angular prediction, this task requires more effort. For modes 11-25, the prediction sample p in the angular direction is... ref When predicting the current block Bc from the set (also called the primary reference side), pref Samples of both the vertical and horizontal portions may be involved. p ref Since determining the location of each sample at any of the branches of p requires some computational effort, an integrated one-dimensional prediction reference is designed for HEVC intra prediction. The method is visualized in FIG. 7. Before performing the actual prediction operation, the reference sample p ref set is mapped to a one-dimensional vector p 1,ref . The projection used for mapping depends on the direction indicated by the intra prediction angle of each intra prediction mode. Only the reference samples from the portion of p ref that will be used for prediction are mapped to p 1,ref . The actual mapping of the reference samples to p 1,ref for each angular prediction mode is shown in FIGS. 8 and 9 for the horizontal angular prediction direction and the vertical angular prediction direction, respectively. The reference sample set p 1,ref is configured once for the block of prediction samples. The prediction is then derived from two adjacent reference samples in a set as detailed below. As can be seen from FIGS. 8 and 9, the one-dimensional reference sample set is not necessarily completely filled for all intra prediction modes. Only the locations within the projection range for the corresponding intra prediction direction are included in the set.

[0202] Predictions for both the horizontal prediction mode and the vertical prediction mode are performed in the same way by simply swapping the x and y coordinates of the block. Prediction from p 1,ref is performed at 1 / 32 pel accuracy. Depending on the value of the angle parameter pang, the sample offset i 1,ref in p idx , and the weighting factor i fact for the sample at position (x, y) are determined. Here, the derivation for the vertical mode is provided. The derivation for the horizontal mode is obtained by swapping x and y and following accordingly.

[0203]

Equation

[0204] ifact is not equal to 0, i.e., the prediction is strictly p 1,ref If it does not cover the complete sample location within, p 1,ref The linear weighting between two adjacent sample locations within is (0 ≤ x, y <Nc)であって、

[0205]

number

[0206] It will be executed as i idx and i fact Note that the value of depends only on y, and therefore only needs to be calculated once per row (for vertical prediction mode).

[0207] VTM-1.0 (Multipurpose Test Model) uses 35 intra-modes, while BMS (Benchmark Set) uses 67 intra-modes. Intra-prediction is a mechanism used in many video coding frameworks to improve compression efficiency when only a given frame can be involved.

[0208] Figure 10A shows an example of 67 intra-prediction modes, such as those proposed for VVC, where several of the 67 intra-prediction modes include a planar mode (index 0), a dc mode (index 1), and angular modes with indices 2 through 66, where the lower left angular mode in Figure 10A is index 2, and the index numbering is incremented until index 66 is the upper right angular mode in Figure 10A.

[0209] As shown in Figure 10B, the latest version of VVC has several modes corresponding to diagonal intra-prediction directions, including the wide-angle mode (illustrated as a dashed line). In any of these modes, if the corresponding position within the block side is fractional, prediction of the sample in block interpolation of the adjacent set of reference samples should be performed. HEVC and VVC use linear interpolation between two adjacent reference samples. JEM uses a more sophisticated 4-tap interpolation filter. The filter coefficients are selected to be either a Gaussian or cubic filter depending on the width or height value. The decision of whether to use width or height is in conjunction with the decision on primary reference side selection; i.e., when the intra-prediction mode is diagonal or greater, the upper side of the reference sample is selected to be the primary reference side, and the width value is selected to determine the interpolation filter to use. Otherwise, the primary side reference is selected from the left side of the block, and the height controls the filter selection process. Specifically, if the selected side length is 8 samples or less, cubic 4-tap interpolation is applied. Otherwise, the interpolation filter is a 4-tap Gaussian filter.

[0210] The specific filter coefficients used in JEM are given in Table 1 (Table 5). The predicted samples are calculated by convolving with the coefficients selected from Table 1 (Table 5) according to the subpixel offset and filter type, as follows:

[0211]

number

[0212] In this expression, ">>" indicates a bitwise right shift operation.

[0213] The offset between the sample to be predicted (or "predicted sample") in the current block and the interpolated sample position may have an integer part and a non-integer part if the offset has a subpixel resolution such as 1 / 32 pixel. In Table 1 (Table 5), and in Tables 2 (Table 6) and 3 (Table 7), the column "Subpixel Offset" refers to the non-integer part of the offset, such as a fractional offset, a fractional part of the offset, or a fractional sample position.

[0214] If a cubic filter is selected, the predicted samples are further clipped to a range of values that is either defined in the SPS or derived from the bit depth of the selected component.

[0215] [Table 5]

[0216] Another set of interpolation filters with 6-bit precision is presented in Table 2 (Table 6).

[0217] [Table 6]

[0218] The intra-prediction sample is calculated by convolving it with coefficients selected from Table 1 (Table 5) according to the subpixel offset and filter type, as follows:

[0219]

number

[0220] In this expression, ">>" indicates a bitwise right shift operation.

[0221] Another set of interpolation filters with 6-bit precision is presented in Table 3 (Table 7).

[0222] [Table 7]

[0223] Figure 11 shows a schematic diagram of several intra-prediction modes used in the HEVC UIP scheme. For luminance blocks, the intra-prediction modes may comprise up to 36 modes, including 3 non-directional modes and 33 directional modes. The non-directional modes may comprise a planar prediction mode, a mean (DC) prediction mode, and a chroma from luma (LM) prediction mode. The planar prediction mode can perform predictions by assuming a block angle plane with horizontal and vertical slopes derived from the block boundaries. The DC prediction mode can perform predictions by assuming a flat block plane with values consistent with the average value of the block boundaries. The LM prediction mode can perform predictions by assuming that the chroma value for a block is consistent with the lumen value for a block. Directional modes can perform predictions based on adjacent blocks, as shown in Figure 11.

[0224] H.264 / AVC and HEVC specify that a low-pass filter may be applied to a reference sample before it is used in the intra-prediction process. The decision of whether or not to use a reference sample filter is determined by the intra-prediction mode and block size. This mechanism is sometimes called mode-dependent intra-smoothing (MDIS). There are also several methods related to MDIS. For example, the Adaptive Reference Sample Smoothing (ARSS) method may signal whether a predicted sample should be filtered, either explicitly (i.e., by including a flag in the bitstream) or implicitly (i.e., by using data hiding to reduce signaling overhead by avoiding placing a flag in the bitstream, for example). In this case, the encoder may make a decision about smoothing by testing the rate distortion (RD) cost for all possible intra-prediction modes.

[0225] As shown in FIG. 10B, the latest version of VVC has several modes corresponding to diagonal intra prediction directions. In any of these modes, when the corresponding position within the block side is fractional, it should be performed to predict samples within block interpolation of the set of adjacent reference samples. HEVC and VVC use linear interpolation between two adjacent reference samples. JEM uses a more sophisticated 4-tap interpolation filter. The filter coefficients are selected to be either a Gaussian filter or a cubic filter depending on the value of the width or height. The decision on whether to use the width or the height is coordinated with the decision on the major reference side selection, that is, when the intra prediction mode is above the diagonal mode, the upper side of the reference samples is selected to be the major reference side, and the width value is selected to determine the interpolation filter during use. Otherwise, the major side reference is selected from the left side of the block, and the height controls the filter selection process. Specifically, when the selected side length is 8 samples or less, cubic interpolation 4-taps are applied. Otherwise, the interpolation filter is a 4-tap Gaussian filter.

[0226] An example of interpolation filter selection for modes smaller and larger than the diagonal mode (shown as 45°) in the case of a 32×4 block is shown in FIG. 12.

[0227] In VVC, a partitioning mechanism based on both quad-trees and binary-trees, called QTBT, is used. As shown in FIG. 13, the QTBT partitioning can provide not only square but also rectangular blocks. Of course, some signaling overhead and increased computational complexity on the encoder side are the price of the QTBT partitioning compared to the conventional quadtree-based partitioning used in the HEVC / H.265 standard. Nevertheless, the QTBT-based partitioning gives better segmentation characteristics and thus demonstrates significantly higher coding efficiency than the conventional quadtree.

[0228] However, in its current state, VVC applies the same filter to both side portions (left and upper side portions) of the reference samples. Regardless of whether the block is oriented vertically or horizontally, the reference sample filter is the same for both reference sample side portions.

[0229] In this specification, the terms "vertically oriented block" ("block with a vertical orientation") and "horizontally oriented block" ("block with a horizontal orientation") are applied to rectangular blocks generated by the QTBT framework. These terms have the same meaning as shown in FIG. 14.

[0230] In the case of a directional intra prediction mode having a positive sub-sample offset, it is necessary to determine the memory size used to store the values of the reference samples. However, this size depends not only on the dimensions of the block of prediction samples but also on the processing further applied to these samples. Specifically, in the case of a positive sub-sample offset, the interpolation filter processing will require a larger-sized primary reference side portion compared to the case when it is not applied. The interpolation filter processing is performed by convolution of the reference samples with a filter core. Therefore, the increase is caused by the additional samples required for the convolution operation to calculate the convolution results for the leftmost and rightmost portions of the primary reference side portion.

[0231] By using the steps described below, it is possible to determine the size of the primary reference side portion and thus reduce the amount of internal memory required to store the samples of the primary reference side portion.

[0232] FIGS. 15A, 15B, 15C to 18 show some examples of intra prediction of a block from the reference samples of the primary reference side portion. For each row of samples of the block of prediction samples, a (possibly fractional) sub-pixel offset is determined. This offset is orthogonal to the selected directional intra prediction mode M and the intra prediction mode M oDepending on the difference between it and (either HOR_IDX or VER_IDX, depending on which of the two is closer to the selected intra-prediction mode), it may have an integer or non-integer value.

[0233] Tables 4 (Table 8) and 5 (Table 9) show the possible values of the subpixel offset relative to the first row of the prediction sample, depending on the mode difference. The subpixel offset relative to other rows of the prediction sample is obtained by multiplying the subpixel offset by the difference between the row position of the prediction sample and the first row.

[0234] [Table 8]

[0235] [Table 9]

[0236] If Table 4 (Table 8) or Table 5 (Table 9) is used to determine the subpixel offset for the lower right predicted sample, it can be noted that, as shown in Figure 15A, the primary reference side size is equal to the sum of the integer part of the greatest or maximum subpixel offset, the size of the side of the block of the predicted sample (i.e., the block side length), and half the length of the interpolation filter (i.e., half the interpolation filter length).

[0237] To obtain the size of the primary reference side for the selected directional intra-prediction mode, which provides a positive value for the subpixel offset, the following steps may be performed:

[0238] 1. Step 1 may consist of determining which side of a block should be taken as the primary side based on the index of the selected intra-prediction mode, and which adjacent sample should be used to generate the primary reference side. The primary reference side is the line of reference samples used in predicting the samples in the current block. The "primary side" is, i.e., the side of the block parallel to the primary reference side. If the (intra-prediction) mode is diagonal mode (for example, mode 34 as shown in Figure 10A), then the adjacent sample above (on top of) the block being predicted (i.e., the current block) is used to generate the primary reference side, and the upper side is selected as the primary side; otherwise, the adjacent sample to the left of the block being predicted is used to generate the primary reference side, and the left side is selected as the primary side. In summary, in Step 1, the primary side is determined for the current block based on the intra-prediction mode of the current block. Based on the primary side, the primary reference side is determined, which includes the reference samples (some or all of them) used for predicting the current block. For example, as shown in Figure 15A, the primary reference side is parallel to the primary (block) side, but may be longer than the block side. In other words, given an intra-prediction mode, for each sample of the current block, a corresponding reference sample is also provided from among multiple reference samples (e.g., primary reference side samples).

[0239] 2. Step 2 may consist of determining the maximum subpixel offset, which is calculated by multiplying the length of the non-major side by the maximum value from either Table 4 (Table 8) or Table 5 (Table 9), such that the result of this multiplication represents a non-integer subpixel offset. Tables 4 (Table 8) and 5 (Table 9) provide exemplary values for subpixel offsets for samples in the first line of samples in the block (either the top row of the sample, corresponding to the upper side selected as the major side, or the leftmost column of the sample, corresponding to the left side selected as the major side). Thus, the values shown in Tables 4 (Table 8) and 5 (Table 9) correspond to the subpixel offset per line of samples. Therefore, the maximum offset that appears in the prediction of the overall block is obtained by multiplying this per-line value by the length of the non-major side. In particular in this example, since the fixed-point resolution is 1 / 32 samples, the result should not be a multiple of 32. If multiplying the length of the non-primary reference side by any value per line from, for example, Table 4 (Table 8) or Table 5 (Table 9) results in a multiple of 32 corresponding to the integer sum of subpixel offsets (i.e., an integer number of samples), this multiplication result is discarded. The non-primary side is the side of the block (either the top side or the left side) that was not selected in step 1. Therefore, if the top side is selected as the primary side, the length of the non-primary side is now the width of the block, and if the left side is selected as the primary side, the length of the non-primary side is now the height of the block.

[0240] 3. Step 3 may consist of taking the integer part of the subpixel offset obtained in Step 2, which corresponds to the multiplication result described above (i.e., by right-shifting by 5 in the binary representation), and summing it with the length of the main side (block width or block length, respectively) and half the length of the interpolation filter, which results in the sum of the main reference sides. Thus, the main reference side comprises a line of samples parallel to and equal in length to the main reference side, extended by adjacent samples in the non-integer part of the subpixel offset and further adjacent samples within half the length of the interpolation filter. Since interpolation is performed over samples within the length of the inter-part of the subpixel offset and the same amount of samples located beyond the length of the subpixel offset, only half the length of the interpolation filter is required.

[0241] According to another embodiment of the present disclosure, the reference sample used to obtain the predicted pixel value is not adjacent to a block of prediction samples. The encoder may signal an offset value in the bitstream, which in turn indicates the distance between adjacent lines of the reference sample and the line of the reference sample from which the predicted sample value is derived.

[0242] Figure 24 shows the possible positions of the reference sample lines and the corresponding values of the ref_offset variable.

[0243] Examples of offset values used in a specific implementation of a video codec (for example, a video encoder or video decoder) are as follows: - Using the adjacent line of the reference sample (ref_offset=0, indicated by "reference line 0" in Figure 24), - Using the first line (closest to the adjacent line) (ref_offset=1, indicated by "reference line 1" in Figure 24), - Use the third line (ref_offset=3, indicated by "reference line 3" in Figure 24).

[0244] The variable "ref_offset" has the same meaning as the further used variable "refIdx". In other words, the variable "ref_offset" or the variable "refIdx" indicates the reference line. For example, when ref_offset = 0, it represents that "reference line 0" (as shown in FIG. 24) is used.

[0245] The directional intra prediction mode specifies the value (deltaPos) of the sub-pixel offset between two adjacent lines of the prediction sample. This value is represented by a fixed-point integer value with 5-bit precision. For example, deltaPos = 32 means that the offset between two adjacent lines of the prediction sample is exactly 1 sample.

[0246] When the intra prediction mode is greater than DIA_IDX (mode #34), for the example described above, the value of the major reference side size is calculated as follows. Among the set of intra prediction modes available (i.e., those that the encoder may indicate for the block of prediction samples), the mode that is greater than DIA_IDX and provides the maximum deltaPos value is considered. The value of the desired sub-pixel offset between the reference sample or the interpolated sample position and the sample to be predicted is derived as follows. That is, the block height is summed with ref_offset and multiplied by the deltaPos value. If the result of this multiplication is divisible by 32 with a remainder of 0, another maximum value of deltaPos as described above is used, provided that when obtaining the mode from the set of available intra prediction modes, the previously considered prediction mode is skipped. Otherwise, the result of this multiplication is considered as the maximum non-integer sub-pixel offset. The integer part of this offset is taken by shifting it 5 bits to the right.

[0247] The size of the major reference side is obtained by summing the integer part of the maximum non-integer sub-pixel offset, the width of the block of prediction samples, and half of the length of the interpolation filter (as shown in FIG. 15A).

[0248] Instead, if the intra-prediction mode is smaller than DIA_IDX (mode #34), the value of the primary reference side size is calculated for the example described above as follows: Of the set of available intra-prediction modes (i.e., modes that the encoder may indicate for a block of prediction samples), the mode that is smaller than DIA_IDX and provides the largest deltaPos value is considered. The desired subpixel offset value is derived as follows: The block width is summed with ref_offset and multiplied by the deltaPos value. If the result is divided by 32 and the remainder is 0, then it is another maximum value of deltaPos as described above, however, when obtaining a mode from the set of available intra-prediction modes, previously considered prediction modes are skipped. Otherwise, the result of this multiplication is considered to be the largest non-integer subpixel offset. The integer part of this offset is taken by shifting it 5 bits to the right. The size of the primary reference side is obtained by summing the integer part of the largest non-integer subpixel offset, the height of the block of prediction samples, and half the length of the interpolation filter.

[0249] Figures 15A, 15B, 15C-18 show several examples of intra-prediction of blocks from reference samples in the main reference side. For each row of block samples of prediction sample 1120, a fractional subpixel offset 1150 is determined. This offset is used for the selected directional intra-prediction mode M and orthogonal intra-prediction mode M o Depending on the difference between it and (either HOR_IDX or VER_IDX, depending on which of the two is closer to the selected intra-prediction mode), it may have an integer or non-integer value.

[0250] State-of-the-art video coding methods, and existing implementations of these methods, take advantage of the fact that, in the case of intra-angle prediction, the size of the primary reference side is determined as twice the length of the corresponding block side. For example, in HEVC, when the intra-prediction mode is 34 or greater (see Figure 10A or Figure 10B), primary reference side samples are taken from the upper and upper-right adjacent blocks if these blocks are available, i.e., not from slices that have already been reconstructed and processed. The total number of adjacent samples used is set to be equal to twice the width of the block. Similarly, when the intra-prediction mode is less than 34 (see Figure 10), primary reference side samples are taken from the left and lower-left adjacent blocks, and the total number of adjacent samples is set to be equal to twice the height of the block.

[0251] However, when applying the subpixel interpolation filter, additional samples are used at the left and right edges of the primary reference side. To maintain compatibility with existing solutions, it is proposed that these additional samples be obtained by padding the primary reference side to the left and right. Padding is performed by repeating the first and last samples of the primary reference side on the left and right sides, respectively. If the primary reference side is denoted as ref and its size as refS, the padding can be expressed as the following allocation behavior: ref[-1]=p[0] ref[refS+1]=p[refS]

[0252] In practice, the use of negative indices can be avoided by applying a positive integer offset when referencing elements of an array. Specifically, this offset can be set to be equal to the number of elements left-padding the main reference side.

[0253] Specific examples of how to perform right padding and left padding are given in the following two cases shown in Figure 15B.

[0254] For right padding examples, for wide-angle modes 72 and -6 (Figure 10B), the value is, for example, equal to 22 |MM o (See Table 4 (Table 8))

[0255]

number

[0256] This occurs when specifying a subpixel offset equal to . When the aspect ratio of the block is 2 (i.e., when the dimensions of the predicted block are equal to 4x8, 8x16, 16x32, 32x64, 8x4, 16x8, 32x16, 64x32), the corresponding maximum subpixel offset value is for the bottom right predicted sample,

[0257]

number

[0258] It is calculated as follows, where S is the smaller side of the block.

[0259] Therefore, for an 8x4 block, the maximum subpixel offset is

[0260]

number

[0261] This is equal to, i.e., the maximum integer subpixel portion of this offset is equal to 7. When applying a 4-tap intra interpolation filter to obtain the value of the bottom-right sample with coordinates x=7, y=3, the reference samples with indices x+7-1, x+7, x+7+1, and x+7+2 will be used. Since the main reference side has 16 adjacent samples with indices 0..15, the last sample in the main reference side is padded by repeating the reference sample with position x+7+1, the rightmost sample position x+7+2=16.

[0262] The same steps are performed when Table 5 (Table 9) is in use for modes 71 and -5. The subpixel offset for this case is

[0263]

number

[0264] Equivalent to,

[0265]

number

[0266] This yields the maximum value.

[0267] For example, when the subpixel offset is a fraction and less than one sample, left padding occurs for angular modes 35..65 and 19..33. For the top-left predicted sample, the corresponding subpixel offset value is calculated. According to Table 4 (Table 8) and Table 5 (Table 9), this offset corresponds to an integer subsample offset of 0.

[0268]

number

[0269] This falls within the specified range. Applying a 4-tap interpolation filter to calculate the predicted sample with coordinates x=0, y=0 requires reference samples with indices x-1, x, x+1, and x+2. The leftmost sample position is x-1=-1. Since the primary reference side has 16 adjacent samples with indices 0..15, the sample at this position is padded by repeating the reference sample with position x.

[0270] From the example above, for a block with an aspect ratio, the main reference side is then padded by half the length of the 4-tap filter, i.e., 2 samples, one of which is added to the beginning (left end) of the main reference side and the other to the end (right end). In the case of a 6-tap interpolation filter, following the steps described above, two more samples should be added to the beginning and end of the main reference side. Generally, when an N-tap intra interpolation filter is used, the main reference side is

[0271]

number

[0272] Padding is done using individual samples, of which

[0273]

number

[0274] Each is padded on the left side,

[0275]

number

[0276] The elements are padded on the right side, and N is a non-negative even integer value.

[0277] Repeating the steps described above for other block aspect ratios yields the following offsets (see Table 6 (Table 10)).

[0278] [Table 10]

[0279] Based on the values given in Table 6 (Table 10), the following applies to the wide-angle intra-prediction mode. When Table 4 (Table 8) is in use, 4-tap interpolation filtering requires left and right padding operations for block sizes 4x8, 8x4, 8x16, and 16x8. When Table 5 (Table 9) is in use, for 4-tap interpolation filtering, left padding and right padding operations are required only for block sizes 4x8 and 8x4.

[0280] Details of the proposed method are described in Table 7 (Table 11) in the specification format. The padding embodiment described above can be expressed as the following modifications to the VVC draft (Section 8.2.4.2.7).

[0281] [Table 11]

[0282] Tables 4 (Table 8) and 5 (Table 9), as described above, represent the possible values of the subpixel offset between two adjacent lines of a prediction sample, depending on the intra-prediction mode.

[0283] State-of-the-art video coding solutions use different interpolation filters in intra-prediction. Specifically, Figures 19-21 show various examples of interpolation filters.

[0284] In the present invention, as shown in Figure 22 or Figure 23, an intra-prediction process of the block is performed, during which a subpixel interpolation filter is applied to the luminance and chrominance reference samples, the subpixel interpolation filter (such as a 4-tap filter) is selected based on the subpixel offset between the position of the reference sample and the position of the sample being interpolated, and the size of the main reference side used in the intra-prediction process is determined according to the length of the subpixel interpolation filter and the intra-prediction mode that yields the maximum value of the subpixel offset. The memory requirement is determined by the maximum value of the subpixel offset.

[0285] Figure 15B shows an example where the top-left sample is not included in the primary reference side, but is instead padded using the leftmost sample belonging to the primary reference side. However, if the predicted sample is calculated by applying a 2-tap subpixel interpolation filter (e.g., a linear interpolation filter), the top-left sample is not referenced, and therefore padding is not required in this case.

[0286] Figure 15C shows an example of when a 4-tap subpixel interpolation filter (e.g., a Gaussian filter, a DCT-IF filter, or a cubic filter) is used. It can be noted that in this case, at least four reference samples are needed to calculate the top-left predicted sample (marked as "A"), namely the top-left sample (marked as "B") and the next three samples (marked as "C", "D", and "E", respectively).

[0287] In this case, two alternative methods are disclosed.

[0288] Use the value of C to pad the value of B.

[0289] Use the reconfigured samples of the adjacent block in exactly the same way that other samples of the main reference side (including "B", "C", and "D") are obtained. In this case, the size of the main reference side is: The main block side length (i.e., the block side length or the size of the block's side in the predicted sample), Half the interpolation filter length minus 1, The following two values M, namely, Block Chief Department Head, The integer part of the maximum subpixel offset + half the interpolation filter length, or the integer part of the maximum subpixel offset + half the interpolation filter length + 1 (for memory considerations, the addition of 1 to this sum may or may not be included). The largest of the two It is determined as the sum.

[0290] Please note that “block main side,” “block side length,” “block main side length,” and “size of the block side of the predicted sample” are the same concept throughout this disclosure.

[0291] It can be understood that half the interpolation filter length minus 1 is used to determine the size of the primary reference side, and therefore it is permissible to extend the primary reference side to the left at any given time.

[0292] The maximum of the two aforementioned values M is used to determine the size of the main reference side, and it can be understood that it is permissible to extend the main reference side to the right at any given time.

[0293] In the above explanation, the block main side length is determined according to the intra-prediction mode (Figure 10B). If the intra-prediction mode is greater than or equal to the diagonal intra-prediction mode (#34), the block main side length is the width of the block in the prediction sample (i.e., the block to be predicted). Otherwise, the block main side length is the height of the block in the prediction sample.

[0294] Subpixel offset values can be defined for a wider range of angles (see Table 8 (Table 12)).

[0295] [Table 12]

[0296] Depending on the aspect ratio, different maximum and minimum values for the intra-predictive mode index (Figure 10B) are permitted. Table 9 (Table 13) provides an example of this mapping.

[0297] [Table 13]

[0298] According to Table 9 (Table 13), the maximum mode difference value max(|MM) o For |), integer subpixel offsets are used for interpolation (the maximum subpixel offset per row is a multiple of 32), which means that the predicted samples in the prediction block are calculated by copying the values of the corresponding reference samples, and no subsample interpolation filter is applied.

[0299] Table 9 (Table 13) max(|MM o Considering the constraints on |) and the values in Table 8 (Table 12), the maximum subpixel offset per row that does not require interpolation is defined as follows (see Table 10 (Table 14)):

[0300] [Table 14]

[0301] Using Table 10 (Table 14), the integer part of the maximum subpixel offset plus half the interpolation filter length for a 4x4 square block can be calculated using the following steps:

[0302] Step 1. The main side length of the block (equal to 4) is multiplied by 29, and the result is divided by 32, thus giving the value 3.

[0303] Step 2. Half of the 4-tap interpolation filter length is 2, and this is added to the value obtained in Step 1, resulting in a value of 5.

[0304] From the example above, it can be observed that the obtained value is larger than the block's main side length. In this example, the size of the main reference side is set to 10, which means that Block main side director (equal to 4), Half the interpolation filter length - 1 (equal to 1), The following two values M, namely, Block main side director (equal to 4), The integer part of the maximum subpixel offset + half the interpolation filter length (equal to 5), or the integer part of the maximum subpixel offset + half the interpolation filter length + 1 (equal to 6) (for memory reasons, the addition of 1 to this sum may or may not be included). The largest of the two It is determined as the sum.

[0305] The total number of reference samples included in a main reference side is greater than twice the number of block main side sections.

[0306] Right padding is not performed when the maximum of the two values M is equal to the block's main side length. Otherwise, right padding is applied to reference samples located at a distance of 2*nTbS (where nTbS represents the block's main side length) or more horizontally or vertically from the position of the upper-left predicted sample (shown as "A" in Figure 15C). Right padding is performed by assigning the value of the padded sample to the last reference sample value on the main block side that is located within the range of 2*nTbS.

[0307] When half the interpolation filter length minus 1 is greater than 0, the value of sample "B" (shown in Figure 15C) is either obtained by left padding, or the corresponding reference sample can be obtained in exactly the same way that reference samples "C", "D", and "E" are obtained.

[0308] Details of the proposed method are described in Table 11 (Table 15) in the specification format. Instead of right padding or left padding, the corresponding reconfigured adjacent reference sample may be used. An example of when left padding is not used may be shown in the following section of the VVC specification (Section 8.2).

[0309] [Table 15]

[0310] Similarly, using Table 10 (Table 14), the integer part of the maximum subpixel offset plus half the interpolation filter length for a non-square block with a width of 4 samples and a height of 2 samples can be calculated (if the main side length of the block is the width) using the following steps:

[0311] Step 1. The block height (equal to 2) is multiplied by 57, and the result is divided by 32, thus giving the value 3.

[0312] Step 2. Half of the 4-tap interpolation filter length is 2, and this is added to the value obtained in Step 1, resulting in a value of 5.

[0313] The remaining steps for calculating the total number of reference samples to be included in the main reference side are the same as in the case of the square block.

[0314] Using the block dimensions from Table 10 (Table 14) and Table 6 (Table 10), it can be noted that the maximum number of reference samples that receive left or right padding is 2.

[0315] If the block to be predicted is not adjacent to any adjacent reconstructed reference samples used in the intra-prediction process (reference lines may be selected as shown in Figure 24), the embodiments described below are applicable.

[0316] The first step is to define the aspect ratio of the block according to the main side of the predicted block by the intra-prediction mode. If the top side of the block is selected to be the main side, the aspect ratio R a The aspect ratio R (referred to as "whRatio" in the VVC specification) is set to be equal to the result of integer division of the block width (referred to as "nTbW" in the VVC specification) by the block height (referred to as "nTbH" in the VVC specification). Otherwise, in the case where the main side is the left side of the prediction block, the aspect ratio R a (referred to as "hwRatio" in the VVC specification) is set to equal to the result of integer division of the block height by the block width. In either case, if the value of Ra is less than 1 (i.e., the numerator of the integer division operator is less than the denominator), it is set to equal to 1.

[0317] The second step is to add the portion of the reference sample (indicated as "p" in the VVC specification) to the main reference side. Depending on the value of refIdx, either adjacent or non-adjacent reference samples are used. The reference samples added to the main reference side are selected using an offset relative to the main block side in the direction of the orientation of the main side. Specifically, if the main side is the upper side of the prediction block, the offset is horizontal and is defined as -refIdx samples. If the main side is the left side of the prediction block, the offset is vertical and is defined as -refIdx samples. In this step, nTbS + 1 samples are added, starting with the upper-left reference sample (indicated as the "B" sample in Figure 15C) + the offset value described above (nTbS represents the length of the main side). Note that the explanation or definition of RefIdx is presented in this disclosure in combination with Figure 24.

[0318] The next steps performed depend on whether the subpixel offset (referred to as "intraPredAngle" in the VVC specification) is positive or negative. A subpixel offset of 0 corresponds to a horizontal intraprediction mode (in the case where the main side of the block is the left block side) or a vertical intraprediction mode (in the case where the main side of the block is the top block side).

[0319] If the subpixel offset is negative (for example, step 3, negative subpixel offset), in the third step, the primary reference side is extended to the left using the reference sample corresponding to the non-primary side. The non-primary side is the side that is not selected as the primary side; that is, when the intra-prediction mode is 34 or greater (Figure 10B), the non-primary side is the left side of the block to be predicted, and otherwise, the non-primary side is the left side of the block. The extension is performed as shown in Figure 7, and a description of this process can be found in the relevant description to Figure 7. The reference sample corresponding to the non-primary side is selected according to the process disclosed in the second step, with the difference that the non-primary side, rather than the primary side, is used. Once this step is complete, the primary reference sides are extended from beginning to end using their first and last samples, respectively; in other words, negative subpixel offset padding is performed in step 3.

[0320] If the subpixel offset is positive (for example, step 3, positive subpixel offset), in the third step, the primary reference side is extended to the right by an additional nTbS samples in the same way as described in step 2. Right padding is performed if the value of refIdx is greater than 0 (the reference sample is not adjacent to the block that should be predicted). The number of samples to be right-padded is equal to the aspect ratio Ra calculated in the first step multiplied by the refIdx value. If a 4-tap filter is in use, the number of samples to be right-padded increases by 1.

[0321] Details of the proposed method are described in Table 12 (Table 16) in the specification format. The following modifications to the VVC specification for this embodiment are possible (refW is set to nTbS-1).

[0322] [Table 16]

[0323] The portion of the VVC specification described above is also applicable to the case where the primary reference side is left-padded by only one sample in the third step for positive values of the subpixel offset. Details of the proposed method are described in Table 13 (Table 17) in the specification format.

[0324] [Table 17]

[0325] This disclosure provides an intra-prediction method for predicting the current block contained within a picture, such as a video frame. The method steps of the intra-prediction method are shown in Figure 25. The current block is the aforementioned block comprising a sample to be predicted (or “predicted sample” or “prediction sample”), for example, a luminance sample or a chrominance sample.

[0326] The method includes the step of determining the size of the main reference side (S2510) based on the intra-prediction mode that yields the largest non-integer value of the subpixel offset from among several available intra-prediction modes (for example, shown in Figures 10-11), and the size (i.e., length) of the interpolation filter.

[0327] A subpixel offset is the offset between the sample in the current block that is to be predicted (or "target sample") and the reference sample (or reference sample position) on which the sample in the current block is predicted. The offset may be related to an angular prediction mode if the reference sample includes samples that are not directly or linearly above (e.g., modes with a number greater than or equal to the diagonal mode) or to the left (e.g., modes with a number less than or equal to the diagonal mode) of the current block, but which are offset or shifted relative to the position of the current block. Since not all modes point to integer reference sample positions, the offset has a subpixel resolution, and this subpixel offset may take a non-integer value, and may have an integer part + a non-integer part. In the case of a non-integer subpixel offset, interpolation is performed between the reference samples. Thus, the offset is the offset between the position of the sample to be predicted and the interpolated reference sample position. The maximum non-integer value may be the maximum non-integer value (integer part + non-integer part) for any sample in the current block. For example, as shown in Figures 15A to 15C, the target sample related to the maximum non-integer subpixel offset may be the bottom-right sample in the current block. Note that intra-prediction modes that result in integer offsets larger than the maximum non-integer value of the subpixel offset will be ignored.

[0328] The possible sizes (i.e., lengths) of an interpolation filter include 4 (for example, if the filter is a 4-tap filter) or 6 (for example, if the filter is a 6-tap filter).

[0329] The method further includes the steps of applying an interpolation filter to the reference samples contained within the main reference side (S2520) and predicting the target samples contained within the current block based on the filtered reference samples (S2530).

[0330] In correspondence with the method shown in Figure 26, a device 2600 for intra-prediction of the current block contained within the picture is also provided. The device 2600 is shown in Figure 26 and may be contained within the video encoder shown in Figure 2 or the video decoder shown in Figure 3. In one example, the device 2600 may correspond to the intra-prediction unit 254 in Figure 2. In another example, the device 2600 may correspond to the intra-prediction unit 354 in Figure 3.

[0331] The apparatus 2600 includes an intra-prediction unit 2610 configured to predict the target samples currently contained within a block based on filtered reference samples. The intra-prediction unit 2610 may be the intra-prediction unit 254 shown in Figure 2 or the intra-prediction unit 354 shown in Figure 3.

[0332] The intra-prediction unit 2610 includes a determination unit 2620 (or “primary reference size determination unit”) configured to determine the size of the primary reference side used in intra-prediction. Specifically, the size is determined based on an intra-prediction mode (of which there are several available intra-prediction modes) that yields the largest non-integer value of the subpixel offset between a target sample (of which there are several target samples) in the current block and a reference sample (hereinafter referred to as the “target reference sample”) used to predict the target sample in the current block, and based on the size of the interpolation filter to be applied to the reference sample contained in the primary reference side. The target sample is any sample in the block to be predicted. The target reference sample is one of the reference samples in the primary reference side.

[0333] The intra-prediction unit 2610 further comprises a filtering unit configured to apply interpolation filters to reference samples contained within the main reference side in order to obtain filtered reference samples.

[0334] In summary, the memory requirements are determined by the maximum subpixel offset. Therefore, by determining the size of the main reference side in accordance with this disclosure, this disclosure facilitates memory efficiency in video coding using intra prediction. Specifically, the memory (buffer) used by the encoder and / or decoder to perform intra prediction can be allocated in an efficient manner according to the determined size of the main reference side. This is because, firstly, the size of the main reference side determined according to this disclosure includes all the reference samples that should be used to predict the block now. Therefore, no further access to samples is required to perform intra prediction. Secondly, this is not required for all samples already processed in adjacent blocks, but rather the memory size may be specifically allocated to the main reference side, i.e., those reference samples belonging to the determined size.

[0335] The following describes the encoding method, the decoding method as shown in the embodiments described above, and examples of applications of systems using them.

[0336] Figure 27 is a block diagram showing a content supply system 3100 for realizing a content distribution service. This content supply system 3100 includes a capture device 3102, a terminal device 3106, and optionally a display 3126. The capture device 3102 communicates with the terminal device 3106 via a communication link 3104. The communication link may include the communication channel 13 described above. The communication link 3104 may include, but is not limited to, Wi-Fi, Ethernet, cable, wireless (3G / 4G / 5G), USB, or any combination thereof.

[0337] The capture device 3102 generates data and may encode the data using the encoding method shown in the embodiments described above. Alternatively, the capture device 3102 may distribute the data to a streaming server (not shown in the figure), which encodes the data and transmits the encoded data to the terminal device 3106. The capture device 3102 includes, but is not limited to, a camera, a smartphone or tablet, a computer or laptop, a video conferencing system, a PDA, an in-vehicle device, or any combination thereof. For example, the capture device 3102 may include the source device 12 as described above. When the data includes video, the video encoder 20 included in the capture device 3102 may actually perform the video encoding process. When the data includes audio (i.e., voice), the audio encoder included in the capture device 3102 may actually perform the audio encoding process. In some practical scenarios, the capture device 3102 distributes the encoded video and audio data by multiplexing them together. In other practical scenarios, for example, in a video conferencing system, the encoded audio data and encoded video data are not multiplexed. The capture device 3102 distributes encoded audio data and encoded video data separately to the terminal device 3106.

[0338] In the content supply system 3100, the terminal device 310 receives and plays back encoded data. The terminal device 3106 may be a device with data reception and recovery capabilities, such as a smartphone or tablet 3108, a computer or laptop 3110, a network video recorder (NVR) / digital video recorder (DVR) 3112, a TV 3114, a set-top box (STB) 3116, a video conferencing system 3118, a video surveillance system 3120, a personal digital assistant (PDA) 3122, an in-vehicle device 3124, or any combination thereof, that is capable of decoding the encoded data described above. For example, the terminal device 3106 may include the destination device 14 as described above. When the encoded data includes video, the video decoder 30 included in the terminal device is preferred for performing video decoding. When the encoded data includes audio, the audio decoder included in the terminal device is preferred for performing audio decoding.

[0339] In the case of terminal devices having a display, such as a smartphone or tablet 3108, a computer or laptop 3110, a network video recorder (NVR) / digital video recorder (DVR) 3112, a TV 3114, a personal digital assistant (PDA) 3122, or an in-vehicle device 3124, the terminal device can supply decoded data to its display. In the case of terminal devices not equipped with a display, such as an STB 3116, a video conferencing system 3118, or a video surveillance system 3120, an external display 3126 is made contact with them to receive and display the decoded data.

[0340] When each device in this system performs encoding or decoding, a picture encoding device or picture decoding device as shown in the embodiments described above may be used.

[0341] Figure 28 shows the structure of an example terminal device 3106. After the terminal device 3106 receives a stream from the capture device 3102, the protocol progression unit 3202 analyzes the transmission protocol of the stream. The protocol may include, but is not limited to, Real-Time Streaming Protocol (RTSP), Hypertext Transfer Protocol (HTTP), HTTP Live Streaming Protocol (HLS), MPEG-DASH, Real-Time Transport Protocol (RTP), Real-Time Messaging Protocol (RTMP), or any combination thereof.

[0342] After the protocol processing unit 3202 processes the stream, a stream file is generated. The file is output to the demultiplexing unit 3204. The demultiplexing unit 3204 can separate the multiplexed data into encoded audio data and encoded video data. As described above, in some practical scenarios, for example, in a video conferencing system, the encoded audio data and encoded video data are not multiplexed. In this situation, the encoded data is sent to the video decoder 3206 and audio decoder 3208 without passing through the demultiplexing unit 3204.

[0343] Through demultiplexing, a video elementary stream (ES), an audio ES, and optionally subtitles are generated. The video decoder 3206, including the video decoder 30 as described in the above embodiment, decodes the video ES using the decoding method shown in the above embodiment to generate video frames and supplies this data to the synchronization unit 3212. The audio decoder 3208 decodes the audio ES to generate audio frames and supplies this data to the synchronization unit 3212. Alternatively, the video frames may be stored in a buffer (not shown in Figure Y) before being supplied to the synchronization unit 3212. Similarly, the audio frames may be stored in a buffer (not shown in Figure Y) before being supplied to the synchronization unit 3212.

[0344] The synchronization unit 3212 synchronizes video frames and audio frames and supplies video / audio to the video / audio display 3214. For example, the synchronization unit 3212 synchronizes the presentation of video with audio information. The information may be coded in the syntax using timestamps related to the presentation of coded audio and visual data, as well as timestamps related to the delivery of the data stream itself.

[0345] If the stream contains subtitles, the subtitle decoder 3210 decodes the subtitles, synchronizes them with the video and audio frames, and supplies the video / audio / subtitle to the video / audio / subtitle display 3216.

[0346] The present invention is not limited to the system described above, and either the picture encoding device or the picture decoding device in the above embodiments may be incorporated into other systems, such as automotive systems.

[0347] While embodiments of the present invention are described primarily in relation to video coding, it should be noted that embodiments of the coding system 10, encoder 20, and decoder 30 (and correspondingly, system 10), as well as other embodiments described herein, may also be configured for still image processing or still image coding, i.e., for processing or coding individual pictures independent of any preceding or consecutive pictures, as in video coding. Generally, when picture processing coding is limited to a single picture 17, only the interpretation units 244 (encoder) and 344 (decoder) may not be available. All other functionalities (also called tools or techniques) of the video encoder 20 and video decoder 30 can be equally used for still image processing, such as residual calculation 204 / 304, transformation 206, quantization 208, inverse quantization 210 / 310, (inverse) transformation 212 / 312, segmentation 262 / 362, intra prediction 254 / 354 and / or loop filtering 220, 320, as well as entropy coding 270 and entropy decoding 304.

[0348] For example, embodiments of the encoder 20 and decoder 30, and the functions described herein with reference to, for example, the encoder 20 and decoder 30, may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored as one or more instructions or codes on a computer-readable medium, or transmitted over a communication medium, or executed by a hardware-based processing unit. The computer-readable medium may include a computer-readable storage medium that corresponds to a tangible medium such as a data storage medium, or a communication medium that includes, for example, any medium that facilitates the transfer of computer programs from one place to another according to a communication protocol. Thus, the computer-readable medium may generally correspond to (1) a non-transient tangible computer-readable storage medium, or (2) a communication medium such as a signal or carrier wave. The data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, codes, and / or data structures for implementing the techniques described herein. A computer program product may include a computer-readable medium.

[0349] For example, and without limitation, such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM, or other optical disk storage, magnetic disk storage, or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is appropriately called a computer-readable medium. For example, if instructions are transmitted from a website, server, or other remote source using coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of a medium. However, it should be understood that computer-readable storage media and data storage media do not include connections, carriers, signals, or other temporary media, but instead refer to non-temporary tangible storage media. As used herein, the terms "disk" and "disc" include Compact Disc (CD), LaserDisc® (disc), Optical Disc (disc), Digital Multipurpose Disc (disc) (DVD), Floppy Disk (disk), and Blu-ray® Disc (disc), where a disk typically reproduces data magnetically, and a disc reproduces data optically using a laser. Combinations of the above should also be included within the scope of computer-readable media.

[0350] Instructions may be executed by one or more processors, such as one or more digital signal processors (DSPs), general-purpose microprocessors, application-specific integrated circuits (ASICs), field-programmable logic arrays (FPGAs), or other uniform integrated or discrete logic circuit configurations. Therefore, the term “processor” as used herein may refer to any of the above-described structures or any other structure suitable for implementing the techniques described herein. In addition, in some embodiments, the functionality described herein may be provided within dedicated hardware and / or software modules configured for encoding and decoding, or incorporated into a combined codec. Furthermore, the techniques may be fully implemented within one or more circuits or logic elements.

[0351] The techniques of this disclosure can be implemented in a wide variety of devices or apparatus, including wireless handsets, integrated circuits (ICs), or sets of ICs (e.g., chipsets). Various components, modules, or units are described in this disclosure to highlight the functional aspects of a device configured to perform the techniques disclosed, but do not necessarily require implementation by various hardware units. Rather, as described above, various units may be combined within a codec hardware unit, or may be provided by a set of interoperable hardware units, including one or more processors as described above, along with suitable software and / or firmware. [Explanation of symbols]

[0352] 10 Video coding system, coding system 12 Source Devices 13 Communication Channels 14 Destination device 16 Picture Sources 17. Pictures, picture data, unprocessed pictures, unprocessed picture data 18 Preprocessors, pre-processing units, picture preprocessors 19 Pre-processed pictures, pre-processed picture data 20 video encoders, encoders 21 Encoded picture data, bitstream, encoded bitstream 22, 28 Communication interface, communication unit 30 video decoders, decoders 31 Decrypted picture, decrypted picture data 32 Post-processors, Post-processing Units 33 Post-processed pictures, post-processed picture data 34 Display Devices 40 Video Coding Systems 41 Imaging devices 42 Antennas 43 processors 44 Memory Store 45 Display Devices 46 Processing Units 47 Logic Circuit Configuration 201 Input section, input interface 203 Picture Block 204 Residual Calculation Unit 205 Residual block, residual 206 Conversion Processing Unit 207 Conversion coefficient 208 Quantization Units 209 Quantization coefficient, quantization transformation coefficient, quantization residual coefficient 210 Inverse Quantization Unit 211 Inverse quantization coefficient, inverse quantization residual coefficient 212 Inverse Transform Processing Unit 213 Reconstructed residual blocks, transformed blocks 214 Reconfiguration Unit 215 Reconstruction Blocks 216 buffers 220 Loop Filters, Loop Filter Units 221 Filtered blocks, filtered reconfigured blocks 230 Decoded Picture Buffer (DPB) 231 Decoded Picture 244 Interpretation Units 254 Intra Prediction Units 260 Mode Selection Unit 262 division units 265 Prediction Blocks 266 Syntax Elements 270 Entropy Coding Units 272 Output section, output interface 304 Entropy Decoding Unit 309 Quantization coefficient 310 Inverse Quantization Unit 311 Transformation coefficients, inverse quantization coefficients 312 Inverse Transform Processing Unit 313 Transformation block, reconstructed residual block 314 Reconfiguration Unit 315 Reconstruction Block 320 Loop Filters, Loop Filter Units 321 Filtered Blocks 330 Decoded Picture Buffer (DBP) 331 Decrypted Picture 344 Interpretation Units 354 Intra Prediction Units 360 Mode Selection Unit 365 Prediction Block 400 video coding devices 410 Inlet port, Input port 420 Receiver Unit (Rx) 430 Processors, Logical Units, Central Processing Unit (CPU) 440 Transmitter Unit (Tx) 450 exit ports, output ports 460 memory 470 coding modules 500 devices 502 Processors 504 memory 506 data 508 Operating Systems 510 Application Programs 512 Bus 514 Secondary Storage 518 displays 520 Image Sensing Devices 522 Sound-sensing devices 2600 equipment 2610 Intra Prediction Unit 2620 decision units 3100 Content Supply System 3102 Capture Device 3104 Communication Link 3106 Terminal device 3108 Smartphones, Tablets 3110 Computers, Laptops 3112 Network video recorder (NVR), Digital video recorder (DVR) 3114 TV 3116 Set-top box (STB) 3118 Video conferencing system 3120 Video Surveillance System 3122 Personal Digital Assistant (PDA) 3124 In-vehicle devices 3126 Display 3202 Protocol Progress Unit 3204 Reverse Multiplexing Unit 3206 Video Decoder 3208 Audio Decoder 3210 Subtitle Decoder 3212 Synchronization Unit 3214 Video / Audio Display 3216 Video / Audio / Subtitle Display

Claims

1. A method of video decoding performed by a decoding device, The steps include performing entropy decoding, inverse quantization, and inverse transform on the bitstream to obtain residual information, A step of performing an intra-prediction process on a block having samples to be predicted to obtain predicted sample values for the block, wherein an interpolation filter is applied to the reference samples of the block during the intra-prediction process of the block; The process includes the step of obtaining a reconfigured sample value of the block according to residual information and the predicted sample value of the block, The interpolation filter is an offset between the reference sample and the sample to be predicted, and is selected from a set of interpolation filters used in the intra-prediction process based on the non-integer portion of a subpixel offset having an integer portion and a non-integer portion. The size of the primary reference side used in the intra-prediction process is determined according to the length of the interpolation filter and an intra-prediction mode that yields the maximum subpixel offset among the set of available intra-prediction modes, wherein the maximum subpixel offset is the maximum offset between the reference sample and the sample to be predicted that appears in the intra-prediction process of the block by the available intra-prediction modes, and the primary reference side comprises the reference sample. method.

2. If the intra prediction mode is greater than the vertical intra prediction mode VER_IDX, the side portion of the block of the prediction sample is equal to the width of the block. or If the intra prediction mode is smaller than the horizontal intra prediction mode HOR_IDX, the side portion of the block is equal to the height of the block. The method according to claim 1.

3. The method according to claim 1 or 2, wherein in the main reference side, the value of a reference sample located at a position larger than twice the size of the side of the block is set to be equal to the value of a sample located at twice the size of the side of the block.

4. Padding is performed by repeating the first and / or last reference samples of the main reference side on the left side and / or right side, respectively, specifically as follows: if the main reference side is denoted as ref and the size of the main reference side is denoted as refS, ref[-1]=p[0] and / or ref[refS+1]=p[refS] The padding is represented as follows: ref[-1] represents the value on the left of the main reference side, p[0] represents the value of the first reference sample of the main reference side, ref[refS+1] represents the value to the right of the main reference side, p[refS] represents the value of the last reference sample in the main reference side, The method according to any one of claims 1 to 3.

5. The method according to any one of claims 1 to 4, wherein the interpolation filter used in the intra prediction process is a finite impulse response filter, and the coefficients of the interpolation filter are fetched from a lookup table.

6. The method according to any one of claims 1 to 5, wherein the interpolation filter used in the intra prediction process is a 4-tap filter.

7. The coefficient c of the interpolation filter 0 , c 1 , c 2 , and c 3 However, as follows, that is, Table 1 As such, the “non-integer portion of subpixel offset” column is defined at a 1 / 32 subpixel resolution, depending on the non-integer portion of the subpixel offset between the reference sample and the sample to be predicted. The method according to claim 6.

8. The coefficient c of the interpolation filter 0 , c 1 , c 2 , and c 3 However, as follows, that is, Table 2 As such, the “non-integer portion of subpixel offset” column is defined at a 1 / 32 subpixel resolution, depending on the non-integer portion of the subpixel offset between the reference sample and the sample to be predicted. The method according to claim 6.

9. The coefficient c of the interpolation filter 0 , c 1 , c 2 , and c 3 are as follows, that is,[ Table 3 As such, the “non-integer portion of subpixel offset” column is defined at a 1 / 32 subpixel resolution, depending on the non-integer portion of the subpixel offset between the reference sample and the sample to be predicted. The method according to claim 6.

10. The method according to claim 1, wherein the set of filters comprises a Gaussian filter and a cubic filter.

11. The method according to any one of claims 1 to 10, wherein the number of interpolation filters is N, the N interpolation filters are used for intra-reference sample interpolation, and N >= 1 and is a positive integer.

12. The method according to any one of claims 1 to 11, wherein the reference sample includes a sample that is not adjacent to the block.

13. A decoder comprising a processing circuit configuration configured to perform the method described in any one of claims 1 to 12.

14. When executed by one or more processors, the one or more processors: The steps include performing entropy decoding, inverse quantization, and inverse transform on the bitstream to obtain residual information, A step of performing an intra-prediction process on a block having samples to be predicted to obtain predicted sample values for the block, wherein an interpolation filter is applied to the reference samples of the block during the intra-prediction process of the block; A method comprising the steps of obtaining a reconfigured sample value of a block according to residual information and the predicted sample value of the block, The interpolation filter is an offset between the reference sample and the sample to be predicted, and is selected from a set of interpolation filters used in the intra-prediction process based on the non-integer portion of a subpixel offset having an integer portion and a non-integer portion. The size of the main reference side used in the intra-prediction process is determined according to the length of the interpolation filter and an intra-prediction mode that yields the maximum value of the subpixel offset from a set of available intra-prediction modes, wherein the maximum value of the subpixel offset is the maximum offset between the reference sample and the sample to be predicted that appears in the intra-prediction process of the block by the available intra-prediction modes, and the main reference side stores computer instructions for video decoding that cause it to perform a method comprising the reference sample.