Structure-aware non-local filtering
Adaptive weighted filters in video coding systems address the issue of structural similarity exploitation by using local texture complexity to enhance picture quality and reduce distortion, enhancing video coding efficiency.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- GUANGDONG OPPO MOBILE TELECOMMUNICATIONS CORP LTD
- Filing Date
- 2024-12-25
- Publication Date
- 2026-07-02
Smart Images

Figure CN2024142457_02072026_PF_FP_ABST
Abstract
Description
STRUCTURE-AWARE NON-LOCAL FILTERINGTECHNICAL FIELD
[0001] The invention relates to the field of computer vision, in particular to the topic of video processing and video coding, more particularly to a method, a decoder, an encoder, and a computer-readable medium for in-loop filtering for picture enhancement in video coding.BACKGROUND
[0002] Current video coding schemes such as H. 265 / High Efficiency Video Coding (HEVC) and H.266 / Versatile Video Coding (VVC) apply so called in-loop filters to the encoded video content inside the coding loop. These filters aim at concealing certain types of artifacts like blocking or at increasing the objective quality of the picture. This processing step is an integral part of the encoding and decoding system. Through this step, the quality of the decoder’s output pictures can be improved. Moreover, the filtered pictures are often used to predict next pictures at both encoder and decoder in coding setups. Therefore, the quality of subsequently coded pictures can be increased.
[0003] Filtering the video in order to increase the quality of the picture requires that there are statistical dependencies that can be exploited by the filtering system. In general, it makes sense to apply in-loop filtering if the quality improvement achieved by the filtering outweighs the signalling costs at this Rate Distortion (RD) -point. Moreover, the computation time needs to be acceptable.
[0004] In a number of video coding systems, a series of filters are applied which address different types of coding errors. For example, there is the de-blocking filter which is applied at block borders to decrease blocking artifacts. In another example, there is the Sample Adaptive Offset (SAO) filter which is mainly designed to reduce ringing or blurring artifacts. Moreover, the Adaptive Loop Filter (ALF) can be used for an objective quality enhancement.
[0005] Most of these filters only use pixel values inside the filter support and, potentially, signalled filter values to find the filtered values. Local similarity values are currently only taken into account by bilateral in-loop filters. However, this type of filter is not capable of finding structural similarities as it uses sample difference as similarity measure.
[0006] Moreover, many linear filters and other techniques suffer from a blurring of the image content. These filters often work based on the assumption that content has more low-frequency components while noise has more high-frequency components. On this basis, it is assumed that low-pass filtering makes sense to improve overall quality. However, that comes at the cost of smoothing the content due to the attenuation of high-frequency components.SUMMARY
[0007] Embodiments of the present application provide a method, a decoder, an encoder, and a computer-readable medium for video coding using weighted filters that overcome problems associated with conventional arrangements.
[0008] According to a first aspect, there is provided a method of processing video data to provide in-loop filtering, performed by an encoder, the method comprising: receiving an input picture comprising a plurality of pixels, each pixel having a respective pixel value; determining, for each pixel in the input picture, a texture complexity using a texture complexity function; applying a parameterised filter to each pixel, wherein one or more parameters for the filter are determined according to the texture complexity of the pixel, and wherein applying the parameterised filter comprises: determining, for each pixel, one or more similarity values to one or more respective filter sample pixels in a filter support area; and applying an aggregation function to each pixel, wherein the output of the aggregation function is dependent on the one or more similarity values determined for the pixel, wherein the one or parameters determined according to the texture complexity of the currently processed pixel influence at least one of: determination of the one of more similarity values, and one or more parameters of the aggregation function.
[0009] Optionally, determining, for each pixel in the input picture, the texture complexity comprises: sectioning the input picture into a plurality of areas, each area comprising one or more pixels of the input picture; determining the texture complexity for each area in the plurality of areas using the texture complexity function.
[0010] Optionally, determining the texture complexity for each area in the plurality of areas using the texture complexity function comprises, for each of the plurality of areas: selecting a texture complexity support area for determining the texture complexity, wherein the texture complexity support area is larger than and comprises the area for which the texture complexity is being determined; determining texture complexity by applying the texture complexity function to the texture support area.
[0011] Optionally, the texture complexity function comprises local variance.
[0012] Optionally, the texture complexity is a texture complexity value.
[0013] Optionally, the texture complexity is a texture complexity class.
[0014] Optionally, determining the texture complexity comprises: determining a texture complexity value using the texture complexity criterion; applying a quantization function to the texture complexity value, wherein the output of the quantization function is the texture complexity class.
[0015] Optionally, the texture complexity class is based on a plurality of pre-defined texture complexity thresholds.
[0016] Optionally, the texture complexity class is based on a plurality of texture complexity thresholds, wherein the plurality of texture complexity thresholds are variable.
[0017] Optionally, determining, for each pixel, one or more similarity values to one or more respective filter sample pixels in a filter support area, comprises: determining the one or more similarity values using a template similarity.
[0018] Optionally, one or more template similarity parameters are selected in dependence on the texture complexity.
[0019] Optionally, the template similarity parameters comprise a template size.
[0020] Optionally, similarity values are determined for a subset of locations in the filter support area.
[0021] Optionally, for a subset of locations in the filter support area, the similarity value is set to a pre-defined value.
[0022] Optionally, the filter support area is diamond, square or circular shaped.
[0023] Optionally, the filter support area is selected based on rate-distortion optimization.
[0024] Optionally, the one or more similarity values are determined using mean squared error as similarity criterion.
[0025] Optionally, for each pixel in the input picture, the aggregation function determines the output of the aggregation function by adding a weighted sum over the one or more filter samples in the filter support area to the pixel value of the pixel, wherein for each pixel in the input picture, the one or more parameters of the aggregation function comprise one or more weights and one or more clipping values; and each term in the weighted sum comprises a weight of the one or more weights and a clipping function, wherein the clipping function clips the distance between the pixel value of the pixel and the filter sample pixel value of a current filter sample pixel and the clipping range of the clipping function is defined by a clipping value in the one or more clipping values.
[0026] Optionally, the one or more clipping values are selected from a set of pre-defined clipping values.
[0027] Optionally, the one or more clipping values are selected by rate-distortion optimization.
[0028] Optionally, for each term in the weighted sum the clipping value in the one or more clipping values is optimized.
[0029] Optionally, the optimized clipping value for each term of the weighted sum is selected from a set of pre-defined clipping values.
[0030] Optionally, the optimized clipping value for each term of the weighted sum is selected from a subset of the set of pre-defined clipping values; and wherein the subset is optimized across all clipping values in the weighed sum.
[0031] Optionally, the one or more clipping values are determined from a function fc dependent on the texture complexity and the similarity value between the pixel and a current filter sample pixel.
[0032] Optionally, a clipping value is determined for each term in the weighted sum, and the function fc further depends a filter sample pixel value of a current filter sample pixel.
[0033] Optionally, the one or more weights for each pixel in the input picture are the same one or more weights for all pixels in the input picture; wherein the one or more weights are optimized using a least squares optimization on a matrix optimization system, wherein the matrix optimization system is defined as
[0034] wherein wj are the weights, Igt (xi, yi) is a ground truth for the ith pixel at position xi, yi in the picture, I (xi, yi) is the pixel value for the ith pixel at position, vij is the jth filter sample pixel value for the ith pixels, cj is the clipping value corresponding to the term in the weighted sum that comprises wj, and fclip is a clipping function; for each pixel in the picture the filter samples are sorted such that their respective similarity values are sorted as si (j-1) <= sij for all j.
[0035] Optionally, for each pixel in the picture, the one or more weights are determined from a weighting function fw.
[0036] Optionally, the weighting function fw is expressed as
[0037] where i and j are the x and y coordinates of the current filter sample in the picture, k and l are the coordinates of the current pixel in the picture, sij is the similarity value between the current pixel and the current filter sample, and σd and σr are parameters controlling the impact of, respectively, distance and similarity between the current filter sample and the current pixel on the weight.
[0038] Optionally, applying a parameterised filter further comprises: applying one or more further aggregation functions to each pixel; and for each pixel, averaging over the output of the aggregation function and the outputs of the one or more further aggregation functions.
[0039] Optionally, the averaging is an adaptive averaging.
[0040] According to a second aspect, there is provided a computer-readable medium comprising computer executable instructions stored thereon which when executed by a computing device cause the computing device to perform any of the methods of the first aspect.
[0041] According to a third aspect, there is provided an encoder, comprising: one or more processors; and a computer-readable medium comprising computer executable instructions stored thereon which when executed by the one or more processors cause the one or more processors to perform any of the methods of the first aspect.
[0042] According to a fourth aspect, there is provided a non-transitory computer-readable medium and / or a computer program product storing a bitstream, the bitstream being generated using any of the methods of the first aspect.
[0043] According to a fifth aspect, there is provided a method of processing video data to provide in-loop filtering, performed by a decoder, the method comprising: receiving an input picture comprising a plurality of pixels, each pixel having a respective pixel value; receiving, for each pixel in the input picture, an indication of texture complexity; applying a parameterised filter to each pixel, wherein one or more parameters for the filter are determined according to the texture complexity of the pixel, and wherein applying the parameterised filter comprises: determining, for each pixel, one or more similarity values to one or more respective filter sample pixels in a filter support area; and applying an aggregation function to each pixel, wherein the output of the aggregation function is dependent on the one or more similarity values determined for the pixel, wherein the one or parameters determined according to the texture complexity of the currently processed pixel influence at least one of: determination of the one of more similarity values, and one or more parameters of the aggregation function.
[0044] Optionally, determining, for each pixel, one or more similarity values to one or more respective filter sample pixels in a filter support area, comprises: determining the one or more similarity values using a template similarity.
[0045] Optionally, one or more template similarity parameters are selected in dependence on the texture complexity.
[0046] Optionally, the template similarity parameters comprise a template size.
[0047] Optionally, similarity values are determined for a subset of locations in the filter support area.
[0048] Optionally, for a subset of locations in the filter support area, the similarity value is set to a pre-defined value.
[0049] Optionally, the filter support area is diamond, square or circular shaped.
[0050] Optionally, the filter support area is selected based on rate-distortion optimization.
[0051] Optionally, the one or more similarity values are determined using mean squared error as similarity criterion.
[0052] Optionally, for each pixel in the input picture, the aggregation function determines the output of the aggregation function by adding a weighted sum over the one or more filter samples in the filter support area to the pixel value of the pixel, wherein for each pixel in the input picture, the one or more parameters of the aggregation function comprise one or more weights and one or more clipping values; and each term in the weighted sum comprises a weight of the one or more weights and a clipping function, wherein the clipping function clips the distance between the pixel value of the pixel and the filter sample pixel value of a current filter sample pixel and the clipping range of the clipping function is defined by a clipping value in the one or more clipping values.
[0053] Optionally, the one or more clipping values are selected from a set of pre-defined clipping values.
[0054] Optionally, the one or more clipping values are selected by rate-distortion optimization.
[0055] Optionally, for each term in the weighted sum the clipping value in the one or more clipping values is optimized.
[0056] Optionally, the optimized clipping value for each term of the weighted sum is selected from a set of pre-defined clipping values.
[0057] Optionally, the optimized clipping value for each term of the weighted sum is selected from a subset of the set of pre-defined clipping values; and wherein the subset is optimized across all clipping values in the weighed sum.
[0058] Optionally, the one or more clipping values are determined from a function fc dependent on the texture complexity and the similarity value between the pixel and a current filter sample pixel.
[0059] Optionally, a clipping value is determined for each term in the weighted sum, and the function fc further depends a filter sample pixel value of a current filter sample pixel.
[0060] Optionally, the one or more weights for each pixel in the input picture are the same one or more weights for all pixels in the input picture; wherein the one or more weights are optimized using a least squares optimization on a matrix optimization system, wherein the matrix optimization system is defined as
[0061] wherein wj are the weights, Igt (xi, yi) is a ground truth for the ith pixel at position xi, yi in the picture, I (xi, yi) is the pixel value for the ith pixel at position, vij is the jth filter sample pixel value for the ith pixels, cj is the clipping value corresponding to the term in the weighted sum that comprises wj, and fclip is a clipping function; for each pixel in the picture the filter samples are sorted such that their respective similarity values are sorted as si (j-1) <= sij for all j.
[0062] Optionally, for each pixel in the picture, the one or more weights are determined from a weighting function fw.
[0063] Optionally, the weighting function fw is expressed as
[0064] where i and j are the x and y coordinates of the current filter sample in the picture, k and l are the coordinates of the current pixel in the picture, sij is the similarity value between the current pixel and the current filter sample, and σd and σr are parameters controlling the impact of, respectively, distance and similarity between the current filter sample and the current pixel on the weight.
[0065] Optionally, applying a parameterised filter further comprises: applying one or more further aggregation functions to each pixel; and for each pixel, averaging over the output of the aggregation function and the outputs of the one or more further aggregation functions.
[0066] Optionally, the averaging is an adaptive averaging.
[0067] According to a sixth aspect, there is provided a computer-readable medium comprising computer executable instructions stored thereon which when executed by a computing device cause the computing device to perform any of the methods of the fourth aspect.
[0068] According to a seventh aspect, there is provided an encoder, comprising: one or more processors; and a computer-readable medium comprising computer executable instructions stored thereon which when executed by the one or more processors cause the one or more processors to perform any of the methods of the fourth aspect.
[0069] By adapting parameters of a filter according to local texture complexity, where the filter utilises similarity values based on a support area of a currently processed pixel, improved performance can be obtained. In this way, structure aware and local similarity based filtering can be provided.
[0070] These and other aspects of the present application may become more readily apparent from the following description of the embodiments.BRIEF DESCRIPTION OF THE DRAWINGS
[0071] Embodiments will now be described, by way of example only, with reference to the accompanying drawings, in which:
[0072] FIG. 1 shows a flowchart of operations of the in-loop filter system performed by an encoder according to an embodiment.
[0073] FIG. 2 shows a flowchart of operations of the in-loop filter system performed by a decoder according to an embodiment.
[0074] FIG. 3 shows an example of a texture complexity support area for texture complexity determination.
[0075] FIG. 4 shows an example of a quantization function for texture complexity classification.
[0076] FIG. 5 shows an example of a filter support area for similarity determination and aggregation.
[0077] FIG. 6 shows a schematic illustration of a decoder according to various embodiments.
[0078] FIG. 7 shows a schematic illustration of an encoder according to various embodiments.DETAILED DESCRIPTION
[0079] Technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the accompanying drawings.
[0080] These technical solutions may be applied to a H. 265 / HEVC or H. 266 / VVC video coding system (e.g. in an in-loop process where other filters such as an adaptive loop filter (ALF) and sample adaptive offset filter (SAO) are currently applied in such coding processes) . However, it is to be understood that these technical solutions may applied in any other video coding system that involves video compression. Furthermore, while these principles are primarily illustrated with reference to video processing, they are also applicable to other data forms, including image processing or even audio processing.
[0081] A “video” in the embodiments refers to one or more pictures. In other words, a video can include one picture or a plurality of pictures. A picture may also be referred to as an “image” .
[0082] An “encoder” is a device capable of encoding data into a bitstream, while a “decoder” is a device capable of decoding the bitstream in order to obtain the encoded data, or an approximation of the encoded data. A “bitstream” comprises a sequence of bits.
[0083] “Intra-prediction” and “inter-prediction” are two prediction operations that can be used within the HEVC and VVC or other coding frameworks for a decoder to process a received bitstream in order to obtain the original signal. In the embodiments, “original signal” or “original video” is used to refer to the data prior to encoding at the encoder. A reference sample in the embodiments may refer to spatially and / or temporally spaced picture data used for the prediction of a picture (or region of a picture) . Intra and inter-prediction operations are also used at the encoder to make rate-distortion decisions.
[0084] In more detail, intra-prediction involves the prediction of data spatially within a single picture, without a reference to other (temporally spaced) pictures. In other words, data for a first region of a picture is used in the prediction of the data for another region of the same picture, but there is no dependence on another temporally spaced picture. In this context, the data for the first region of the picture is considered a “reference sample” .
[0085] Inter-prediction involves the prediction of data between a plurality of temporally-spaced pictures. In other words, data for a first region of a first picture is used in the prediction of data for a second region of a second picture. The first and second region may or may not be spatially separated from one another. In this context, the data for the first region of the first picture is considered a “reference sample” . It is further noted that inter-prediction may sometimes use multiple reference regions from different pictures at once, i.e. for a single prediction operation.
[0086] A “residual” in the embodiments may refer to value obtained based on an original value of a region of a picture and a prediction value of the region of the picture (e.g. the difference between the original value and the predicted value) .
[0087] A “block” in the embodiments may refer to a portion of a picture. For example, a picture may be portioned into two or more blocks. However, this only an example. If a picture is not partitioned, then a “block” can refer to the entire picture.
[0088] A “filter” in the embodiments may refer to a filter that acts to enhance a signal. In general, a filter may reduce artifacts arising from coding errors. However, embodiments are not limited to this and the filter can instead be configured to provide alternative or additional enhancements in other embodiments.
[0089] The general optimization problem in video coding is to minimize the transmission rate and the distortions at the same time. A lower transmission rate leads to stronger and more visible distortions which reduce the perceived quality of the viewer. The errors caused by the encoding are generally not random but caused by the processing steps in the encoder and decoder. For example, two important steps of a video coding system are prediction and transformation. Quantization of the transform coefficients can induces reconstruction errors. Many video coding systems employ a hybrid coding structure, where the content of a block is predicted by intra-or inter prediction. This prediction is usually not perfectly accurate. Consequently, the difference of the ground-truth signal is calculated, transformed and encoded to compensate for the prediction error. The signal after the addition of the residual is filtered by so-called in-loop filters, examples of which are described in embodiments.
[0090] Embodiments of the invention provide a method and computer readable medium for a structure preserving in-loop filtering system for use in an encoder or decoder. The method applies a similarity-based parameterised filter where the filtering procedure is picture content adaptive. The picture content is classified using a local texture complexity classifier. Based on the result of the texture complexity classifier, a different set of filter parameters may be signalled or selected based on pre-defined criteria. For every location in the picture, a set of similar patches is found. This set of similar patches is used to filter the pixel at the current position. Thereafter, a weighted average of the pixels in the filter support region is calculated based on their corresponding similarity values.
[0091] Embodiments of the invention can provide structure aware and local similarity based filtering. Such methods can reduce coding errors without strongly affecting coded content. Benefits of these methods can be understood by recognising that there is no or only a small statistical dependency of coding errors while the coded content is typically very correlated.
[0092] Current, non-local similarity based approaches try to circumvent blurring and smoothing of picture content by utilizing local content similarities. The assumption is that very similar content can be found in a local (temporally or spatially) region around the currently processed location. If the content similarity is high while the noise variance is low and not very statistically dependent, a reduction of noise can be achieved by a weighted averaging of two or more similar samples. The number of filter samples and the averaging function depends on the content and the coding errors. This makes filtering less efficient if static filter parameters are used. Therefore, in embodiments described below, the optimal filter parameters depend on local characteristics of the content.
[0093] To deal with this problem and to increase efficiency, locally adaptive filters can be used. In embodiments a local texture complexity classifier is used. Depending on the local texture complexity, a different set of filter parameters is used by the encoder or signalled to the decoder. The adaptive filter parameter may include the support region of the filter, the aggregation parameters and the size of the template for local similarity calculation. Non-local similarity based approaches are usually very complex in comparison to other filtering approaches. The complexity depends on the template size and the support region of the filter. However, reducing the template size may impact the filtering performance.
[0094] FIG. 1 shows a flow chart 100 of operations of a structure aware filtering system 110 performed by an encoder 70 according to an embodiment (example hardware of decoder 70 shown in FIG. 7) . Figure 2 shows a flow chart 200 of operations of a structure aware filtering system 210 performed by a decoder 60 according to an embodiment (example hardware of decoder 60 shown in FIG. 6) .
[0095] As shown in FIG. 1, the filtering system 110 comprises a parameterised filter 120 which acts to filter pixels based on the output of an earlier stage.
[0096] The operations of FIG. 1 start at step 102, where the filtering system 110 receives an input picture. The input picture may be a reconstructed picture from a decoded residual of a picture and / or a prediction of the picture. The input picture may be a whole or part of a picture or one of a plurality of blocks in a picture. The input picture may be part of video data containing multiple pictures or an individual image. Combinations of the foregoing input picture options or other pictures than explicitly listed are possible.
[0097] At step 104, the filtering system 110 determines, for each pixel in the input picture, a texture complexity using a texture complexity function. Texture complexity may be separately determined for each individual pixel in the input picture or texture complexity may be calculated for groups of pixels in combination. In some examples, the input picture may be sectioned into a plurality of areas, each area comprising one or more pixels where each pixel in the area has the texture complexity determined for the area as a whole. The texture complexity may be determined by analysing a texture complexity support area larger than and comprising the pixel or the area for which the texture complexity is to be determined. The texture complexity function may be a local variance, an average local gradient magnitude, a gaussian weighted local variance or any other suitable criterion measured for the support area. It may also be calculated as any suitable (weighted) combination of such criteria.
[0098] The texture complexity function applied to the support area may output a texture complexity value as the texture complexity for each pixel. The texture complexity value for each pixel may be classified into one of a set of classes to provide the final texture complexity. For example, a texture complexity function may output a texture complexity value that is quantized to determine a texture complexity class. The texture complexity class may be determined based on whether the texture complexity value satisfies one or more thresholds out of a plurality of thresholds. These thresholds may be pre-defined, chosen from a pre-defined set of possible thresholds, or optimized based on the intended application or using rate-distortion optimization. Other methods of determining a texture complexity using a texture complexity function are possible. Determining the texture complexity will be discussed further in the context of Figures 3 and 4.
[0099] Parameterised filter 120, which is applied to each pixel in the input image, receives the texture complexity determined for each pixel. One or more of the parameters of parameterised filter 120 are determined according to the texture complexity of the pixel. In this way, the filter operates according to the local conditions of the pixel. As described below, the parameters of the filter control one or more parameters of a similarity calculation and one or more parameters of an aggregation function. Steps 106 and 108 are performed in parameterized filter 120.
[0100] At step 106, the parameterized filter 120 determines, for each pixel, one or more similarity values between the currently processed pixel and one or more respective filter sample pixels in a filter support area. This similarity calculation may have one or more parameters selected on the basis of the texture complexity of the currently processed pixel. For example, the filter support area may have a geometry (for example, size) which is dependent upon the texture complexity determined for the currently processed pixel. As such, the support area size may be determined according to the texture complexity. Rate-distortion optimization may be applied to identify the appropriate geometry for the support area for a particular texture complexity (either value or class) .
[0101] The filter support area may have a variety of shapes. In some examples, the filter support area may be diamond, square or circular shaped. As noted above, the geometry of filter support area may be controlled by parameters of the filter in dependence on the texture complexity. Moreover, parameters of the filter may also be selected in dependence on, for example, coding quality, frame type (intra or inter frame) or picture size.
[0102] The similarity values may be determined using any appropriate method. The relevant values for comparison may be the direct pixel values themselves, such that each similarity value may be determined as the similarity between the value of the currently processed pixel and the value of a filter sample pixel. However, in other examples a similarity template may be used to obtain each value for comparison; in this way, the pixel values across a similarity template centred on the currently processed pixel may be compared with pixel values across the similarity template when centred on the filter sample pixels in order to obtain the similarity values.
[0103] In such examples, the geometry (for example, size) of the similarity template may vary according to the texture complexity of the currently processed pixel. For example, the template size may be determined according to the texture complexity. The template size may be selected based on rate-distortion optimization.
[0104] The similarity may be determined using any appropriate similarity or dissimilarity criteria. Examples of similarity criteria are mean squared error, mean absolute error or correlation.
[0105] The filter sample pixels for which similarity values are calculated may be a subset of all pixels in the filter support area. Alternatively, similarity values may be calculated for all the pixels in the filter support area. In some examples, for a subset of the one or more filter sample pixels the similarity value is set to a pre-defined value. In some examples, filter sample pixels that do not fulfil a similarity requirement may not be considered for further processing or the similarity value may be set to a pre-defined value.
[0106] Determination of similarity values is discussed in more detail with respect to FIG. 5 below.
[0107] At step 108 the parameterised filter 120 applies an aggregation function to each pixel. The output of the aggregation function is dependent on the one or more similarity values determined for the pixel (i.e., a currently processed pixel or current pixel) . The output of the aggregation function may also depend on pixel values, filter sample pixel values and / or the distance between the pixel and the filter sample pixels. Moreover, parameters of the aggregation function may be determined according to the texture complexity of the pixel. The aggregation function may be the sum of the pixel value of the current pixel and a weighted sum over the one or more filter sample pixels in the filter support area for the current pixel. In some embodiments, for each pixel one or more weights and clipping values are defined and one or more of these may be dependent on the texture complexity for the currently processed pixel. These may be used in the terms of the weighted average. For example, in an embodiment each term of weighted sum comprises a weight and a clipping function that clips the distance between the pixel value of the currently processed pixel and the filter sample pixel value of a current filter sample pixel to a clipping range defined by a clipping value.
[0108] In some embodiments, the clipping value may be only one clipping value across all terms in the weighted sum of the aggregation function for a current pixel. This clipping value may be selected from a set of pre-defined clipping values. For example, the clipping value may be selected by rate-distortion optimization. In some embodiments, clipping values for each term in the weighted sum are optimized. These clipping values may be selected from a set of pre-defined clipping values or from a subset of the pre-defined clipping values optimized across call clipping values in the weighted sum. In some embodiments, clipping values are determined from a function dependent on the texture complexity and the similarity value between the pixel and the current filter sample pixel. This function, may also depend the filter sample pixel value of the corresponding current filter sample pixel.
[0109] In some embodiments, the one or more weights for each pixel in the input picture are the same one or more weights for all pixels in the input picture. In such embodiments, the one or more weights may optionally be optimized using a least squares optimization on a matrix optimization system devised taking the similarity values between each pixel and their respective sets of filter sample pixels into account. In some examples, the filter samples may be sorted in the matrix optimization system based on their similarity to the currently process pixel. In some examples, least similar filter samples may be dropped.
[0110] In some embodiments, the one or more weights for each pixel the input picture are determined independently and therefore generally different from each other. This does not preclude weights being identical either coincidentally or because they were selected from a predefined set of weights. In some examples, for each pixel in the picture, the one or more weights are determined from a weighting function. Such a weighting function may take any suitable form. For example, this weighting function may be a bilateral weighting function and / or a parametric exponential function.
[0111] In some embodiments, more than one aggregation function may be applied to each pixel. The output of all aggregation functions may be averaged. For example, the averaging may be performed adaptively. The averaging may depend on the coding parameters.
[0112] A more in-depth discussion of applying the aggregation function can be found below.
[0113] FIG. 2 shows a flow chart 200 of operations of a structure aware filtering system 210 performed by a decoder. As shown in FIG. 2, the filtering system 210 implemented at the decoder comprises a parameterised filter 220 in line with the equivalent structure implemented at the encoder in FIG. 1.
[0114] Steps 202, 206 and 208 operate in the same manner at the encoder and the decoder. As such, the description of these steps provided with regards to FIG. 1 above, and the related description below, is equally applicable at the decoder. However, rather than determining a texture complexity as at step 104, at the decoder an indication of texture complexity is received at step 204 from the encoder-side. That is, the encoder may signal the determined texture complexity for each pixel derived at step 104 which is then received by the decoder at step 204 and subsequently implemented at steps 206 and 208.
[0115] The indication of texture complexity may comprise texture complexity values and / or texture complexity class for each pixel. Alternatively, it may comprise one or more additional values which have been derived from the texture complexity. In particular, parameters for the parameterised filter may be signalled directly if considered appropriate by the skilled person. Nevertheless, since at least one of such parameters is dependent on the determined texture complexity, this still serves as an indication of texture complexity.
[0116] In one or more alternative embodiments, the step 204 may be performed in the same manner as step 104; that is, in alternatives the decoder may directly calculate the texture complexity from the input picture in the same manner as performed at the encoder.
[0117] Texture complexity determination
[0118] The goal of the texture complexity function (e.g. as used at step 104) is to spatially separate the picture such that the different texture complexity classes are optimally separated with respect to their associated filter parameters. This may be beneficial since local texture complexity may be a good predictor for the filter parameters which provide improved performance.
[0119] For example, it can be assumed that there are a large number of similar patches for simple content in a local neighborhood. However, in regions of complex content it can reasonably be assumed that the content is highly textured or has a stronger structure. As such, similar patches are less likely to be found and the expected error between the current and the matched patch is higher. On the other hand, the coding error is usually larger for complex content compared to simple content. Moreover, the spatial distribution of similar patches is highly dependent on the texture complexity. For at least these reasons, selection different filter parameters based on the texture complexity of the current region / pixel can offer improved performance.
[0120] In some embodiments, the texture complexity is classified by the local variance E [ (X-E [X] ) 2] . Other texture complexity measures such as average local gradient magnitude or also a gaussian weighted local variance may be used to achieve a similar effect. Similar alternative measures, as well as combinations of the same, may be used to derive a texture complexity. The local texture complexity may be calculated for each pixel in a picture, for blocks of size m×n in order to reduce computational complexity, or a combination of individual pixels and blocks of varying sizes. The local texture complexity is calculated for a texture complexity support area of (m+ys) × (n+xs) . Thereby, xs and ys may be non-negative multiples of two. Note that this notation also includes the case where the texture complexity is calculated for each pixel if m=1 and n=1. If the texture complexity is calculated for a larger area with m>1 or n>1, the same (calculated) texture complexity value is assigned to all pixels in the area. This reduces computational complexity.
[0121] A larger support area for calculating the texture complexity is used to get a better approximation of the local texture complexity. However, the support area should not be too large as the texture complexity is not location independent. However, most signals have similar statistical properties in local neighborhoods. In a more general scenario, any support area that is larger than or equal to the currently processed area may be used. A support area equal to the currently processed area may be particularly advantageous when the currently processed area of size m×n is already large. Moreover, a support area equal to the currently processed area may be advantageous if the computational complexity is very constrained. A circular shaped support area may decrease average distance and may improve coding performance.
[0122] Figure 3 shows an example for the texture complexity calculation in a picture 30. The area for which the texture complexity is calculated 31 is marked in dark grey and the hatched area is the support area 32 which is used to calculate the texture complexity. In embodiments, the support area 32 includes the area for which the texture complexity is calculated 31. In this example the support area 31 extends beyond the area 31 in all directions. However, this need not be the case everywhere in the picture (e.g. at edges of the picture) . Parameters for this scenario are m=n=2, xs=ys=2. When applying local variance as texture complexity criterion, the following formula to calculate the texture complexity for the dark grey pixel locations 31 is used. For this case, the value of the bottom left pixel at position x=y=0 is denoted as v0, 0 and the value of the top right pixel at position x=y=7is denoted as v7, 7. Given that, the texture complexity values t for the dark grey pixels v3, 3, v3, 4, v4, 3, v4, 4 is calculated as follows.
[0123] The resulting texture complexity value may have a very wide range of values. This works well if the filter parameters (e.g., of parameterised filter 120 or 220) are chosen by a function of the local texture complexity. However, this is not feasible if filter parameters are signalled by the encoder to the decoder (e.g., encoder 70 and decoder 60) for each texture complexity value. For that reason texture complexity ranges can be defined in order to classify the texture complexity values into a plurality of texture complexity classes. For each texture complexity range / class, a different parameter set is optimized by the encoder and signalled to the decoder. For example, a quantization function may be defined as shown in Figure 4.
[0124] In this example, there are four texture complexity classes corresponding to four texture complexity ranges, namely c0∶ [0, 10] , c1∶ [10, 40] , c2∶ [40, 150] , c3∶ [150, ∞] . This is of course only an example and the ranges may be set depending on the application. First, optimal ranges depend on the bit-depth of the video. Second, the ranges depend on the characteristics of the video and the settings of the coding system. If the bit-depth is higher, the texture complexity ranges need to be scaled accordingly. Moreover, the optimal setting may depend on the distribution of complex to simple content in the picture. In case there is a lot of simple content, assigning more classes to the low texture complexity content might be more optimal. In general, the texture complexity thresholds (in this case 10, 40 and 150) and the number of texture complexity thresholds may be set depending on the demands of the application or based on rate-distortion (RD) optimization. In rate-distortion optimization, an optimization problem in the form of R+λD is minimized to get the best threshold. In this case, R is the rate required to transmit all filter parameters and thresholds and D is the distortion after the filtering operation. The rate and the distortion depend on the filter parameters for the filters (e.g. filters in parameterized filter 120 or 220) of each texture complexity class. For each texture complexity class, these filters are also RD-optimized in an independent process. For signalling to the decoder by the encoder, two variants may be used. In the first variant, a filter is optimized by the encoder and signalled to the decoder for each texture complexity class. In the second variant, a set of filters is optimized by the encoder and an index indicating the used filter is signalled to the decoder for each texture complexity class. This allows to share the same filter for more than one texture complexity class without the requirement to signal the filter twice.
[0125] Similarity Function
[0126] For each to be filtered location in the parameterized filter (e.g. filter 120 or 220) , the similarity of the center template to all or a subset of shifted locations within a filter support area (set of shifted locations) may be calculated. That means the similarity value may not be calculated for all locations within the filter support area. In this case, the similarity for the remaining locations can be replaced by pre-defined values. This can be used to reduce computational complexity by excluding as subset of locations from the similarity calculation and replacing the similarity values by pre-defined values. This could be useful for very close pixels which are expected to have high similarity on average. This process results in a list of pixel values vi and corresponding similarity values si.
[0127] An example is shown in Figure 5. The light grey square represents the currently processed pixel 51 in picture 50, i.e. the to-be-filtered pixel. The dark grey dots represent the support area. These are all positions for which the similarity needs to be calculated. Each filter sample pixel 52 in the support area generates a pair of pixel value vi (the value of the pixel in the picture 50 at this position) and a similarity si (the similarity between the between the template at the location of the to-be-filtered pixel 51 and the dark grey dot 52) . Thus, for each to-be-filtered pixel a table of filter sample pixel values and corresponding similarity values is generated.
[0128] These pixels 52 may be used in an averaging in the aggregation function. The similarity is calculated by applying some similarity metric to a template around the current (light grey) pixel 51 and a template at the currently processed location within filter sample support area 52 (dark grey dot) . In some embodiments, the template around current pixel 51 and the template around currently processed filter sample 52 in the support area have the same shape.
[0129] In some examples, the similarity measure may not be a template similarity. In such examples, the similarity may be calculated between the values of current pixel 51 and current filter sample pixel 52.
[0130] The support area does not need to have a shape as shown in the example pf Fig. 5. In general any arrangement and number of samples is possible. Diamond shaped filters (as in the example) are frequently used in video coding. The size may be adaptively selected. To achieve a better removal of the noise, a larger support area or less densely distributed samples may be used. The support area may be predefined based on encoder parameters such as coding quality, frame type (intra or inter frame) or picture size. Also, coding block (coding unit, prediction unit or transform unit) based adaption could be used. Thereby, coding parameters of the block might be used. Moreover, the texture complexity may be used to derive the optimal support area and template size. Another option is to rate-distortion optimize and signal both parameters in the bitstream. As similarity metric any function that measures similarity or dissimilarity may be used. Some examples are mean squared error, mean absolute error or correlation.
[0131] In one embodiment, the similarity function may use the Mean Squared Error (MSE) as a similarity criterion. In this case, the similarity function fsim to determine a template similarity value between a current pixel 51 at location in a picture 50 and a currently processed filter sample pixel 52 offset from the current pixel by ox and oy has the following form.
[0132] Aggregation Function
[0133] The aggregation function (e.g., the aggregation function applied at steps 108 or 208) is a function which calculates the resulting pixel value from the set of samples from the support region (e.g., filter samples 52 in the support region shown in Fig. 5) . Besides the pixel value, each sample has a set of attached parameters. These may include the template similarity and the distance to the currently processed pixel. That means, one input to the aggregation function is a list of filter sample pixel values vi, similarity values si and distance values di (optional) .
[0134] The aggregation function is a parametric function. Parameters of the aggregation function may depend on the determined (by the encoder) or signalled (by the encoder to the decoder) texture complexity of the currently processed pixel. The same parameters are used by the encoder as are signalled by to the decoder for use in the decoder. In some embodiments, the aggregation function has the following form:
[0135] In this equation, f (x, y) is the value of the current pixel 51, vi are the filter sample pixel values, wi and ci are the respective weights and clipping values for each filter sample, and fclipis a clipping function, that clips the difference between the value of the current pixel 51 and the value of each filter sample pixel 52. The sum is over all filter samples. The output of the aggregation function is the filtered value of the current pixel 51.
[0136] Clipping Values of the Aggregation Function
[0137] The clipping values may be signalled from the encoder to the decoder by texture complexity class, or may be a function of the texture complexity class. Also, the clipping values may be filter sample pixel dependent. That means that the clipping values may differ for each filter sample. In some examples, this could be signalled to the decoder or they could be defined by a function of the similarity, where both encoder and decoder use the same function. In the following, the three options for clipping value derivation are presented. Other methods of determining the clipping values may be used.
[0138] The first option is signalling one clipping value to the decoder. In this scenario, a set of n clipping values C= {c0, c1, . . ., cn} is defined. The encoder tests all clipping values in this set in an RD-optimization and signals the index i of the best clipping value for a current to-be-filtered pixel 51 to the decoder. That is, in this version each clipping value ci for a to-be-filtered pixel 51 is the same clipping value.
[0139] The second option is signalling a set of multiple clipping values. In this scenario, for each weighting coefficient, a different clipping value may be defined. That is, for each term in the weighted sum of the aggregation function a different clipping value is defined. The optimal clipping value is optimized in an optimization at the encoder such that the clipping values are optimal or close to an optimal solution. That does also imply that all the clipping values need to be signalled to the decoder. In this version, the set of clipping values C may be pre-defined, or a subset of clipping values may be selected from the set C in order to reduce signaling costs. Given that there are n clipping values and k weighting coefficients, ld (n) k bits would be required to signal the clipping values. Reducing the number of clipping values n by a pre-selecting the most useful clipping values might significantly reduce coding costs, while maintaining coding efficiency. The most useful clipping values may be selected by optimizing the subset across all values in the weighed sum of the aggregation function.
[0140] A clipping value selection for a filter setup with clipping values per filter weighting coefficient may be as follows.
[0141] A subset of m clipping values is selected from a larger set of n clipping values first. Then, the clipping value in the smaller subset is optimized and signalled for each of the k filter coefficients wk.
[0142] A third method is to estimate the clipping values from a (parametric) function. This can be done per filter coefficient. Therein, a function is used to derive the clipping value ci. The function may depend on the local texture complexity value, the similarity values, the filter values and steering parameters signalled by the encoder to the decoder. The intention of this method is to reduce signalling costs, as the clipping values no longer need to be signalled directly by the encoder. However, it might be less optimal than optimizing and signalling the clipping values directly.
[0143] Weighting Coefficients of the Aggregation Function
[0144] The values wi may be generated in a different way depending on the design of the aggregation function.
[0145] In embodiments, a first method is to optimize the values wj. This is easiest, if there are a fixed number of filter samples 52 for each of the m to-be-filtered pixels 51. That means that all or only the n most similar samples are used for the filtering each pixel 51. With that, a least squares optimization of the filter coefficients wj can be done. The resultant matrix vector optimization system can be solved by least squares optimization.
[0146] For example the matrix vector optimization system may be as follows.
[0147] I (xi, yi) is the image / picture value for the i-th filter location / current pixel 51 at position (xi, yi) in the picture 50. Igt (xi, yi) is the ground truth for the i-th filter location / current pixel 51 at position (xi, yi) in the picture 50. vij is the pixel value of j-th filter sample 52 for the i-th filter location / to-be-filtered pixel 51. wj are the weights, cj are the clipping value corresponding to the term in the weighted sum that comprises wj and fclip is a suitable clipping function. In some examples, ground truth Igt (xi, yi) is the ground truth pixel value at position (xi, yi) in ground truth picture Igt. In some examples, the ground truth image / picture Igtis the uncoded picture received by the encoder for which picture I is a reconstruction. In such examples, by solving the matrix optimization system, the weights of the aggregation function are optimized such that the weighted sum of the clipped difference values are as close as possible to the residual determined by the encoder from the uncoded picture.
[0148] In this optimization, the values vij , for i fixed are sorted such that their corresponding similarity values sij are sorted with si (j-1) ≤sij for all j. This is advantageous in the optimization as the statistical properties depend on the similarity. This can be exploited more optimally, if the values are sorted by their similarity value such that the weight wj always corresponds to j-th most similar value.
[0149] Another option is to signal the parameters of the weighting function fw, i. In this case one or a set of parametric functions is defined. Let the clipped filter samples be defined as vclip, 0, vclip, 1, . . ., vclip, n. With that, a set of predictions yi are computed. These values are averaged to generate the prediction y.
[0150] For example, an aggregation function with k different parametric prediction functions may be as shown below. Each parametric prediction function fw, i takes its respective parameters and all clipped filter samples vclip, 0, vclip, 1, . . ., vclip, n as input. The parameters may be signaled to the decoder and the weighting parameters for the averaging may be signalled to the decoder.
[0151] The idea is that the values yi are good predictors for the ground-truth value in different scenarios. That means that for example y0 works well in textured regions while e.g. y1 is a good predictor for screen content. Signaling the averaging parameters to the decoder might be more efficient than coding the optimal parameters for a weighting function. This is the case as the number of predicted values k is usually much smaller than the number of filter samples. In this case, the parameters ai would be signalied such that vgt≈a0y0+a1y1+. . . +anyn for all pixels belonging to this texture complexity class.
[0152] The parametric function may be an exponential weighting function which weights the filter samples 52 based on similarity and distance to the currently processed pixel 51. This function may be a bilateral weighting function, such as
[0153] Here, i and j are the x and y coordinates of the current filter sample and k and l are the coordinates of the currently processed location / pixel. The similarity of the template at location i, j is defined by sij. Examples methods for calculating sij may include using a distance metric between the template at the currently processed location (k, l) and at the shifted sample location (i, j) . Examples of such metrics are, mean of squared pixel differences or sum of absolute pixel differences. σd and σr are hyperparameters of the weighing function. σd is used to achieve a spatial weighting. In some examples, filter samples closer to the currently processed pixel are more reliable predictors for the current pixel value than samples further away. For example, this is a typical characteristic in most natural pictures. If σd is low, pixels in a local neighborhood have a high influence, while pixels further away contribute almost nothing. On the other hand, setting σd to very high values will lead to an almost equal contribution (or a contribution mostly defined by σr) of the pixels inside the filter support. The σr parameter steers the dependency on the local similarity. Having a low value for σr means that pixels with higher similarity have a higher weight. Setting σr to very high values will lead to an averaging which does not consider the similarity. Both σd and σr might depend on the content of the picture (e.g. via the texture complexity) and can be used to achieve better results in filtering. The parameters (and hyperparameters) of the weighting function may be pre-defined. For example, if a set of functions is used, a set of parameters may be pre-defined. An averaging of the outputs is done to receive the predicted value. If only one function is used, parameters could be signalled or defined based on the texture complexity map.
[0154] Further implementation details
[0155] The decoder implementation in current video coding standards has to follow a set of rules. One of these is that all decoder operations must be defined by integer operations. This is done to ensure that every decoder implementation behaves exactly the same independent of the used hardware. Different CPUs may use slightly different implementations for floating point operations. This may affect the result of these computations. Therefore, the use of floating point operations is disallowed in the specification of video decoders. Consequently, an approximation by using integer operations needs to be done. Another limitation is that for operations at most 32 bit numbers can be used and for storage of pictures at most 16 bit numbers may be used. Changing the code to an integer implementation is comparably simple for the texture complexity classification and similarity function and a bit more complicated for the aggregation function.
[0156] Implementation of texture complexity classification
[0157] The texture complexity classification calculates a texture complexity criterion. One example is the local variance. For a current pixel at position the calculation is done by an equation of the following form.
[0158] Changing all operations directly to integer calculations is not optimal due to rounding errors.
[0159] Using this form reduces rounding errors in an integer implementation. Further improvement can be achieved by multiplying operands by some integer numbered factor. As an example, we show this for the calculation of the mean.
[0160] Shifting the mean by s is equivalent to multiplying by 2s. This reduces rounding errors in the division by mn. This can be used in the calculation of the local variance.
[0161] The shift parameter should be set such that the maximum range is used to minimize rounding errors. Also, storing the values without shifting back to original precision may improve the performance of subsequent processing steps.
[0162] Implementation of Similarity Function
[0163] The integerization of the similarity function is very similar to that of the texture complexity classification. The similarity function may use the Mean Squared Error (MSE) as similarity criterion. In this case, the function is calculated for an offset of ox and oy and has the following form (as discussed above with respect to Figure 5) .
[0164] After integerization this equation has the following form.
[0165] Storing this number at higher precision by applying a left shift is possible. This may make sense depending on the applied aggregation function. For codecs like HEVC / VVC with an internal precision and a 32-bit precision for calculations a left shift of up to 10 bits makes sense in order to decrease rounding errors.
[0166] Implementation of Aggregation Function
[0167] The integerization of the aggregation function is more difficult, since an integer approximation of the exponential function is required (see discussion of weights above) .
[0168] The same methods as before can be applied for the argument of the exponential function. For approximating the exponential function, different methods may be applied depending on the demands of the application. The first method is using a lookup table with linear interpolation. This involves storing a table of pre-calculated values of the exponential function in a table. After the argument of the exponential function is calculated, the closest two values are found and the value of the exponential function is interpolated. To achieve high precision, a large number of values may need to be stored. A second version would be to use a tailor series approximation of the function. Lastly, rewriting the function as a power by two is possible. This is a very fast method since powers of two can be efficiently computed by bit-shift operations. It uses the following equality.
[0169] Assuming that the argument is a positive integer number, the result can be approximated by a bit-shift as follows.
[0170] The division can be approximated by an integer division. However, since only integer numbered powers of two can be results of this approximation, results may be more imprecise than other methods.
[0171] FIG. 6 shows a schematic illustration of a decoder 60 according to an embodiment. Specifically, FIG. 6 shows a schematic illustration of a decoder 60 configured to perform any of the decoder methods discussed herein. Such detailed descriptions thereof are omitted here for brevity.
[0172] As shown in FIG. 6, the decoder 60 comprises a processor 61 and a computer readable medium 62. The processor 61 and the computer readable medium 62 may be connected via a bus system. The computer readable medium is configured to store programs, instructions or codes. The processor 61 is configured to execute the programs, the instructions or the codes in the computer readable medium 62 so as to complete the operations in the decoder method embodiments herein.
[0173] Hence, in embodiments, the computer readable medium 62 is configured to store a computer program capable of being run in the processor 61, and the processor 61 is configured to run the computer program to perform steps in any of the decoder methods discussed herein.
[0174] FIG. 7 shows a schematic illustration of an encoder 70 according to an embodiment. Specifically, FIG. 7 shows a schematic illustration of an encoder 70 configured to perform any of the encoder methods discussed herein. Such detailed descriptions thereof are omitted here for brevity.
[0175] As shown in FIG. 7, the encoder 70 comprises a processor 71 and a computer readable medium 72. The processor 71 and the computer readable medium 72 may be connected via a bus system. The computer readable medium is configured to store programs, instructions or codes. The processor 71 is configured to execute the programs, the instructions or the codes in the computer readable medium 72 so as to complete the operations in the decoder method embodiments herein.
[0176] Hence, in embodiments, the computer readable medium 72 is configured to store a computer program capable of being run in the processor 71, and the processor 71 is configured to run the computer program to perform steps in any of the decoder methods discussed herein.
[0177] Embodiments of the invention are described within the context of an encoder or decoder for video coding. It will be understood that the disclosure is not limited to such embodiments and that the taught methods may also be applied to images, videos and pictures in other contexts. Moreover, the filter system described in the embodiments need not be applied for in-loop filtering or in an encoder or decoder context.
[0178] Embodiments of the invention can also provide a computer-readable medium having computer-executable instructions to cause one or more processors of a computing device to carry out the method of any of the embodiments of the invention.
[0179] Examples of computer-readable media include both volatile and non-volatile media, removable and non-removable media, and include, but are not limited to: solid state memories; removable disks; hard disk drives; magnetic media; and optical disks. In general, the computer-readable media include any type of medium suitable for storing, encoding, or carrying a series of instructions executable by one or more computers to perform any one or more of the processes and features described herein.
[0180] It will be appreciated that the functionality of each of the components discussed can be combined in a number of ways other than those discussed in the foregoing description. For example, in some embodiments, the functionality of more than one of the discussed devices can be incorporated into a single device. In other embodiments, the functionality of at least one of the devices discussed can be split into a plurality of separate (or distributed) devices.
[0181] Conditional language such as “may” , is generally used to indicate that features / steps are used in a particular embodiment, but that alternative embodiments may include alternative features, or omit such features altogether.
[0182] Furthermore, the method steps are not limited to the particular sequences described, and it will be appreciated that these can be combined in any other appropriate sequences. In some embodiments, this may result in some method steps being performed in parallel. In addition, in some embodiments, particular method steps may also be omitted altogether.
[0183] While certain embodiments have been discussed, it will be appreciated that these are used to exemplify the overall teaching of the present invention, and that various modifications can be made without departing from the scope of the invention. The scope of the invention should is to be construed in accordance with the appended claims and any equivalents thereof.
[0184] Many further variations and modifications will suggest themselves to those versed in the art upon making reference to the foregoing illustrative embodiments, which are given by way of example only, and which are not intended to limit the scope of the invention, that being determined by the appended claims.
[0185] Acronyms ALF Adaptive Loop Filter BIF Bilateral In-Loop Filter HEVC High Efficiency Video Coding MSE Mean Squared Error RD Rate Distortion SAO Sample Adaptive Offset VVC Versatile Video Coding
Claims
1.A method of processing video data to provide in-loop filtering, performed by an encoder, the method comprising:receiving an input picture comprising a plurality of pixels, each pixel having a respective pixel value;determining, for each pixel in the input picture, a texture complexity using a texture complexity function;applying a parameterised filter to each pixel, wherein one or more parameters for the filter are determined according to the texture complexity of the pixel, and wherein applying the parameterised filter comprises:determining, for each pixel, one or more similarity values to one or more respective filter sample pixels in a filter support area; andapplying an aggregation function to each pixel, wherein the output of the aggregation function is dependent on the one or more similarity values determined for the pixel,wherein the one or parameters determined according to the texture complexity of the currently processed pixel influence at least one of: determination of the one of more similarity values, and one or more parameters of the aggregation function.2.The method of claim 1, wherein determining, for each pixel in the input picture, the texture complexity comprises:sectioning the input picture into a plurality of areas, each area comprising one or more pixels of the input picture;determining the texture complexity for each area in the plurality of areas using the texture complexity function.3.The method of claim 2, wherein determining the texture complexity for each area in the plurality of areas using the texture complexity function comprises:for each of the plurality of areas:selecting a texture complexity support area for determining the texture complexity, wherein the texture complexity support area is larger than and comprises the area for which the texture complexity is being determined;determining texture complexity by applying the texture complexity function to the texture support area.4.The method of any one of claims 1 to 3, wherein the texture complexity function comprises local variance.5.The method of any one of claims 1 to 4, wherein the texture complexity is a texture complexity value.6.The method of any one of claims 1 to 4, wherein the texture complexity is a texture complexity class.7.The method of claim 6, wherein determining the texture complexity comprises:determining a texture complexity value using the texture complexity criterion;applying a quantization function to the texture complexity value, wherein the output of the quantization function is the texture complexity class.8.The method of claim 7, wherein the texture complexity class is based on a plurality of pre-defined texture complexity thresholds.9.The method of claim 7, wherein the texture complexity class is based on a plurality of texture complexity thresholds, wherein the plurality of texture complexity thresholds are variable.10.The method of any one of claims 1 to 9, wherein determining, for each pixel, one or more similarity values to one or more respective filter sample pixels in a filter support area, comprises:determining the one or more similarity values using a template similarity.11.The method of claim 10, wherein one or more template similarity parameters are selected in dependence on the texture complexity.12.The method of claim 11, wherein the template similarity parameters comprise a template size.13.The method of any one of claims 1 to 12, wherein similarity values are determined for a subset of locations in the filter support area.14.The method of any one of claims 1 to 13, wherein for a subset of locations in the filter support area, the similarity value is set to a pre-defined value.15.The method of any one of claims 1 to 14, wherein the filter support area is diamond, square or circular shaped.16.The method of any one of claims 1 to 15, wherein the filter support area is selected based on rate-distortion optimization.17.The method of any one of claims 1 to 16, wherein the one or more similarity values are determined using mean squared error as similarity criterion.18.The method of any one of claims 1 to 17, wherein for each pixel in the input picture, the aggregation function determines the output of the aggregation function by adding a weighted sum over the one or more filter samples in the filter support area to the pixel value of the pixel,wherein for each pixel in the input picture, the one or more parameters of the aggregation function comprise one or more weights and one or more clipping values; andeach term in the weighted sum comprises a weight of the one or more weights and a clipping function, wherein the clipping function clips the distance between the pixel value of the pixel and the filter sample pixel value of a current filter sample pixel and the clipping range of the clipping function is defined by a clipping value in the one or more clipping values.19.The method of claim 18, wherein the one or more clipping values are selected from a set of pre-defined clipping values.20.The method of claim 19, wherein the one or more clipping values are selected by rate-distortion optimization.21.The method of any one of claims 18 to 20, wherein for each term in the weighted sum the clipping value in the one or more clipping values is optimized.22.The method of claim 21, wherein the optimized clipping value for each term of the weighted sum is selected from a set of pre-defined clipping values.23.The method of claim 22, wherein the optimized clipping value for each term of the weighted sum is selected from a subset of the set of pre-defined clipping values; and wherein the subset is optimized across all clipping values in the weighed sum.24.The method of any one of claims 18 to 23, wherein the one or more clipping values are determined from a function fc dependent on the texture complexity and the similarity value between the pixel and a current filter sample pixel.25.The method of claims 24, wherein a clipping value is determined for each term in the weighted sum, and the function fc further depends a filter sample pixel value of a current filter sample pixel.26.The method of any one of claims 18 to 25, wherein the one or more weights for each pixel in the input picture are the same one or more weights for all pixels in the input picture;wherein the one or more weights are optimized using a least squares optimization on a matrix optimization system, wherein the matrix optimization system is defined aswherein wj are the weights, Igt (xi, yi) is a ground truth for the ith pixel at position xi, yi in the picture, I (xi, yi) is the pixel value for the ith pixel at position, vij is the jth filter sample pixel value for the ith pixels, cj is the clipping value corresponding to the term in the weighted sum that comprises wj, and fclip is a clipping function;for each pixel in the picture the filter samples are sorted such that their respective similarity values are sorted as si (j-1) <= sij for all j.27.The method of any one of claims 18 to 26, wherein, for each pixel in the picture, the one or more weights are determined from a weighting function fw.28.The method of claim 27, wherein the weighting function fw is expressed as where i and j are the x and y coordinates of the current filter sample in the picture, k and l are the coordinates of the current pixel in the picture, sij is the similarity value between the current pixel and the current filter sample, and σd and σr are parameters controlling the impact of, respectively, distance and similarity between the current filter sample and the current pixel on the weight.29.The method of any one of claims 1 to 28, wherein applying a parameterised filter further comprises:applying one or more further aggregation functions to each pixel; andfor each pixel, averaging over the output of the aggregation function and the outputs of the one or more further aggregation functions.30.The method of claim 29, where the averaging is an adaptive averaging.31.A computer-readable medium comprising computer executable instructions stored thereon which when executed by a computing device cause the computing device to perform the method of any one of claims 1 to 30.32.An encoder, comprising:one or more processors; anda computer-readable medium comprising computer executable instructions stored thereon which when executed by the one or more processors cause the one or more processors to perform the method of any one of the claims 1 to 30.33.A non-transitory computer-readable medium storing a bitstream, the bitstream being generated using one or more of claims 1-30.34.A method of processing video data to provide in-loop filtering, performed by a decoder, the method comprising:receiving an input picture comprising a plurality of pixels, each pixel having a respective pixel value;receiving, for each pixel in the input picture, an indication of texture complexity;applying a parameterised filter to each pixel, wherein one or more parameters for the filter are determined according to the texture complexity of the pixel, and wherein applying the parameterised filter comprises:determining, for each pixel, one or more similarity values to one or more respective filter sample pixels in a filter support area; andapplying an aggregation function to each pixel, wherein the output of the aggregation function is dependent on the one or more similarity values determined for the pixel,wherein the one or parameters determined according to the texture complexity of the currently processed pixel influence at least one of: determination of the one of more similarity values, and one or more parameters of the aggregation function.35.The method of claim 34, wherein determining, for each pixel, one or more similarity values to one or more respective filter sample pixels in a filter support area, comprises:determining the one or more similarity values using a template similarity.36.The method of claim 35, wherein one or more template similarity parameters are selected in dependence on the texture complexity.37.The method of claim 36, wherein the template similarity parameters comprise a template size.38.The method of any one of claims 34 to 37, wherein similarity values are determined for a subset of locations in the filter support area.39.The method of any one of claims 34 to 38, wherein for a subset of locations in the filter support area, the similarity value is set to a pre-defined value.40.The method of any one of claims 34 to 39, wherein the filter support area is diamond, square or circular shaped.41.The method of any one of claims 34 to 40, wherein the filter support area is selected based on rate-distortion optimization.42.The method of any one of claims 34 to 41, wherein the one or more similarity values are determined using mean squared error as similarity criterion.43.The method of any one of claims 34 to 42, wherein for each pixel in the input picture, the aggregation function determines the output of the aggregation function by adding a weighted sum over the one or more filter samples in the filter support area to the pixel value of the pixel,wherein for each pixel in the input picture, the one or more parameters of the aggregation function comprise one or more weights and one or more clipping values; andeach term in the weighted sum comprises a weight of the one or more weights and a clipping function, wherein the clipping function clips the distance between the pixel value of the pixel and the filter sample pixel value of a current filter sample pixel and the clipping range of the clipping function is defined by a clipping value in the one or more clipping values.44.The method of claim 43, wherein the one or more clipping values are selected from a set of pre-defined clipping values.45.The method of claim 44, wherein the one or more clipping values are selected by rate-distortion optimization.46.The method of any one of claims 43 to 45 wherein for each term in the weighted sum the clipping value in the one or more clipping values is optimized.47.The method of claim 46, wherein the optimized clipping value for each term of the weighted sum is selected from a set of pre-defined clipping values.48.The method of claim 47, wherein the optimized clipping value for each term of the weighted sum is selected from a subset of the set of pre-defined clipping values; and wherein the subset is optimized across all clipping values in the weighed sum.49.The method of any one of claims 43 to 48, wherein the one or more clipping values are determined from a function fc dependent on the texture complexity and the similarity value between the pixel and a current filter sample pixel.50.The method of claim 49, wherein a clipping value is determined for each term in the weighted sum, and the function fc further depends a filter sample pixel value of a current filter sample pixel.51.The method of any one of claims 43 to 50, wherein the one or more weights for each pixel in the input picture are the same one or more weights for all pixels in the input picture;wherein the one or more weights are optimized using a least squares optimization on a matrix optimization system, wherein the matrix optimization system is defined aswherein wj are the weights, Igt (xi, yi) is a ground truth for the ith pixel at position xi, yi in the picture, I (xi, yi) is the pixel value for the ith pixel at position, vij is the jth filter sample pixel value for the ith pixels, cj is the clipping value corresponding to the term in the weighted sum that comprises wj, and fclip is a clipping function;for each pixel in the picture the filter samples are sorted such that their respective similarity values are sorted as si (j-1) <= sij for all j.52.The method of any one of claims 43 to 51, wherein, for each pixel in the picture, the one or more weights are determined from a weighting function fw.53.The method of claim 52, wherein the weighting function fw is expressed as where i and j are the x and y coordinates of the current filter sample in the picture, k and l are the coordinates of the current pixel in the picture, sij is the similarity value between the current pixel and the current filter sample, and σd and σr are parameters controlling the impact of, respectively, distance and similarity between the current filter sample and the current pixel on the weight.54.The method of any one of claims 34 to 53, wherein applying a parameterised filter further comprises:applying one or more further aggregation functions to each pixel; andfor each pixel, averaging over the output of the aggregation function and the outputs of the one or more further aggregation functions.55.The method of claim 54, where the averaging is an adaptive averaging.56.A computer-readable medium comprising computer executable instructions stored thereon which when executed by a computing device cause the computing device to perform the method of any one of claims 34 to 55.57.A decoder, comprising:one or more processors; anda computer-readable medium comprising computer executable instructions stored thereon which when executed by the one or more processors cause the one or more processors to perform the method of any one of the claims 34 to 55.