Neural network in-loop flexibility syntax for weighting

By implementing weighting mechanisms to regulate the interaction between neural network loop filters and other filters in video codecs, the method enhances codec performance, flexibility, and reduces complexity, addressing the limitations of current video coding standards.

WO2026135538A1PCT designated stage Publication Date: 2026-06-25TELEFONAKTIEBOLAGET LM ERICSSON (PUBL)

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
TELEFONAKTIEBOLAGET LM ERICSSON (PUBL)
Filing Date
2025-09-26
Publication Date
2026-06-25

AI Technical Summary

Technical Problem

Current video coding standards lack high-level syntax mechanisms for governing the interaction of neural network-based loop filters with other concepts and mechanisms in block-based image or video codecs, leading to suboptimal performance, flexibility, and computational complexity.

Method used

A method for decoding and producing a bitstream that includes obtaining and applying weighting information to regulate the interaction between neural network loop filters and other filters, allowing for finer granularity and flexibility in weighting between neural network loop filter outputs and other filter outputs, such as deblocking filters.

Benefits of technology

Improves codec performance by enhancing compression efficiency, reducing complexity, and decreasing energy consumption through better integration of neural network loop filters with existing codec components.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure SE2025050851_25062026_PF_FP_ABST
    Figure SE2025050851_25062026_PF_FP_ABST
Patent Text Reader

Abstract

A method (1000) for decoding a picture from a bitstream. The method includes obtaining (s1002) weighting information including at least a first weighting information component and a second weighting information component specifying one or more weight factors specifying a weighting between first-NNLF-output and non-first-NNLF-output. The one or more weight factors include at least a first weight factor, and the first-NNLF-output is produced using a first neural network (NN) loop filter (NNLF), based on an NN block of an input block of the picture. The method also includes decoding (s1004) the picture using the first and the second weighting information components.
Need to check novelty before this filing date? Find Prior Art

Description

NEURAL NETWORK IN-LOOP FLEXIBILITY SYNTAX FOR WEIGHTINGTECHNICAL FIELD

[0001] Disclosed are embodiments related to a method for decoding a picture from a video bitstream, a method for producing a video bitstream, a corresponding computer program, a corresponding carrier, a decoding apparatus for decoding a picture from a video bitstream, and an encoding apparatus for producing a video bitstream.BACKGROUND

[0002] 1. Video compression

[0003] Video is a dominant form of data traffic in today’s networks and is projected to increase its share. One way to reduce the data traffic in a network caused by the transmission of video data is to compress the video data before transmission. For example, the video data (a.k.a., source video) can be encoded to a coded video bitstream (or just “bitstream” for short), which then can be stored and transmitted to end users. A decoder can produce video from the encoded data in the bitstream and display the produced video (also known as the decoded video) on a screen. If a lossless video compression is used, the decoded video should be identical to the source video.

[0004] Because the encoder may not know what kind of device the bitstream is going to be sent to, it is advantageous for the encoder to compress the video according to a video encoding standard, such as the Versatile Video Coding (WC) standard, the High Efficiency Video Coding (HEVC) standard, and the AOMedia Video 1 (AVI) video coding format, to name a few. This way, all devices that support the standard (e.g., VVC) can decode the encoded video data. Compression can be lossless, i.e., the decoded video will be identical to the source video given to the encoder, or lossy, where a certain degradation of content is accepted. Using lossy compression allows for significantly lower bit rates, i.e., the compression ratio can be much higher. This is because reproducing image noise perfectly can make lossless compression quite expensive.

[0005] A video sequence contains a sequence of pictures. A color space commonly used in video sequences is YCbCr, where Y is the luma (brightness) component and Cb and Cr are thechroma components. Sometimes the Cb and Cr components are called U and V. Other color spaces can also be used, such as ICtCp, IPT, constant-luminance YCbCr, RGB, YCoCg etc, and this disclosure is applicable also in these cases. The terms luma and chroma channels are sometimes used instead of luma and chroma components and are used interchangeably in this disclosure.

[0006] 2. WC, AVI , and HEVC

[0007] WC and its predecessor HEVC are block-based video codecs standardized and developed jointly by ITU-T and MPEG. AVI is another block-based video codec specified by the Alliance for open media (AOM).

[0008] The codecs utilize both temporal and spatial prediction. WC, AVI, and HEVC, are similar in many aspects. Spatial prediction is achieved using intra (I) prediction from within the current picture. Temporal prediction is achieved using uni-directional (P) or bi-directional inter (B) prediction on the block level from previously decoded reference pictures. In the encoder, the difference between the original pixel data and the predicted pixel data, referred to as the residual, is transformed into the frequency domain, quantized and then entropy coded before transmitted together with necessary prediction parameters such as prediction mode and motion vectors, which also are entropy coded. The terms pixels, pixel values, samples, and sample values, are in this disclosure used interchangeably. The decoder performs entropy decoding, inverse quantization and inverse transformation to obtain the residual, and then adds the residual to the intra or inter prediction to reconstruct a picture. The WC version 1 specification was published as Rec. ITU-T H.266 | ISO / IEC 23090-3, “Versatile Video Coding”, in 2020.

[0009] Exploratory work is ongoing for the next generation video codec in the jointJVET collaboration between ITU-T and ISO / IEC, with one track for traditional coding tools in the enhance compression model (ECM) software and one track for neural network (NN) coding tools. AOM is also working on a successor to the AVI video codec, likely to be called AV2.

[0010] 3. Blocks and Units

[0011] In many video coding standards, such as HEVC, AVI, and WC, each component is split into blocks and the coded video bitstream consists of a series of coded blocks. A block is a two-dimensional array of values (e.g., sample values). It is common in video coding that thepicture is split into units that cover a specific area of the picture. Each unit consists of all blocks from all components that make up that specific area and each block belongs fully to one unit. The macroblock in H.264 and the Coding Unit (CU) in HEVC and VVC are examples of units.

[0012] A transform used in coding can be applied to a block and these blocks are known under the name “transform blocks”. Also, a single prediction mode can be applied to a block and these blocks can be called “prediction blocks”.

[0013] 4. WC block structure

[0014] The WC video coding standard uses a block structure referred to as quadtree plus binary tree plus ternary tree block structure (QTBT+TT) where each picture is first partitioned into square blocks called coding tree units (CTU). The size of all CTUs is identical and the partition is done without any syntax controlling it. Each CTU is further partitioned into coding units (CU) that can have either square or rectangular shapes. The CTU is first partitioned by a quad tree structure, then it may be further partitioned with equally sized partitions either vertically or horizontally in a binary structure to form coding units (CUs). A block could thus have either a square or rectangular shape.

[0015] The depth of the quad tree and binary tree can be set by the encoder in the bitstream. The ternary tree (TT) part adds the possibility to divide a CU into three partitions instead of two equally sized partitions; this increases the possibilities to use a block structure that better fits the content structure in a picture. A CTU may comprise one or three coding tree blocks (CTBs) where each of the CTBs contain an NxN block of samples for a channel. For mono coded video with only one channel each CTU only comprises one CTB whereas for a YCbCr coded video each CTU comprises one luma CTB and two chroma CTBs. The chroma CTBs may be spatially subsampled compared to the luma CTB, e.g., by a factor of two in vertical and horizontal directions. In the text below, when referring to the size of a CTU the size of the luma CTB is sometimes used, which is equivalent.

[0016] 5. Parameter Sets

[0017] HEVC specifies three types of parameter sets, the picture parameter set (PPS), the sequence parameter set (SPS), and the video parameter set (VPS). The PPS contains data that is common for a whole picture, the SPS contains data that is common for a coded video sequence(CVS), and the VPS contains data that is common for multiple CVSs, e.g., data for multiple scalability layers in the bitstream.

[0018] WC also specifies the PPS, the SPS, and the VPS, and one additional parameter set, the adaptation parameter set (APS). The APS in WC carries parameters needed for the adaptive loop filter (ALF) tool, the luma mapping and chroma scaling (LMCS) tool, and the scaling list tool.

[0019] Both HEVC and WC allow certain information (e.g., parameter sets) to be provided by external means. “By external means” should be interpreted such that the information is not provided in the coded video bitstream but by some other means not specified in the video codec specification, e.g., via metadata possibly provided in a different data channel, as a constant in the decoder, or provided through an API to the decoder.

[0020] In AVI, a sequence header is used, which is similar to SPS.

[0021] 6. NAL units

[0022] Both WC and HEVC define a Network Abstraction Layer (NAL). All the data, i.e., both Video Coding Layer (VCL) or and non-VCL data in HEVC and WC is encapsulated in a NAL unit. A VCL NAL unit contains data that represents picture sample values such as coded slices. A non-VCL NAL unit contains additional associated data such as parameter sets and supplemental enhancement information (SEI) messages. The NAL unit in WC and HEVC begins with a header called the NAL unit header. A NAL unit type syntax element nal_unit_type indicates and specifies how the NAL unit should be parsed and decoded. The bytes after the NAL unit header are the payload of the type indicated by the NAL unit type. A bitstream consists of a series of concatenated NAL units.

[0023] 7. Open bitstream units (OBUs)

[0024] NAL units in AVI are called Open bitstream units (OBUs). Similar to HEVC and WC, all AVI data is encapsulated in OBUs and an AVI bitstream consists of a series of concatenated OBUs. The OBU begins with a header called obu headerQ that includes a code word, obu type, that specifies the type of the OBU. These are equivalent to the nal unit headerQ and nal unit type in WC and HEVC. The OBU type identifies the type of data that is carried in the OBU and specifies how the OBU should be parsed and decoded.

[0025] 8. Slices and tiles

[0026] The concept of slices in HEVC divides the picture into independently coded slices, where decoding of one slice in a picture is independent of other slices of the same picture. Different coding types could be used for slices of the same picture, i.e., a slice could either be an I-slice, P-slice, or B-slice. One purpose of slices is to enable resynchronization in case of data loss. In HEVC, a slice is a set of CTUs.

[0027] The VVC, AVI, and HEVC, video coding standards include a tool called tiles that divides a picture into rectangular spatially independent regions. Using tiles, a picture in WC can be partitioned into rows and columns of CTUs where a tile is an intersection of a row and a column.

[0028] In WC, a slice is defined as an integer number of complete tiles or an integer number of consecutive complete CTU rows within a tile of a picture that are exclusively contained in a single NAL unit. In WC, a picture may be partitioned into either raster scan slices or rectangular slices. A raster scan slice consists of a number of complete tiles in raster scan order. A rectangular slice consists of a group of tiles that together occupy a rectangular region in the picture or a consecutive number of CTU rows inside one tile. Each slice has a slice header comprising syntax elements. Decoded slice header values from these syntax elements are used when decoding the slice. Each slice is carried in one VCL NAL unit. In AVI, the tile group corresponds to a slice. A tile group OBU in AVI contains an integer number of coded tiles. In an early draft of the WC specification, slices were referred to as tile groups.

[0029] 9. Picture header

[0030] In WC, a coded picture contains a picture header (PH) structure. The picture header structure contains syntax elements that are common for all slices of the associated picture. The picture header structure may be signaled in its own non- VCL NAL unit with NAL unit type PH NUT or included in the slice header given that there is only one slice in the coded picture. For a CVS where not all pictures are single-slice pictures, each coded picture must be preceded by a picture header that is signaled in its own NAL unit. HEVC does not support picture headers. The equivalent header in AVI is referred to as a frame header.

[0031] 10. Subpictures

[0032] Subpictures are supported in VVC where a subpicture is defined as a rectangular region of one or more slices within a picture. This means a subpicture contains one or more slices that collectively cover a rectangular region of a picture. In VVC, subpicture location and size are signaled in the SPS. Boundaries of a subpicture region may be treated as picture boundaries (excluding in-loop filtering operations) conditioned to a per-subpicture flag subpic_treated_as_pic_flag[ i ] in the SPS. Also loop-filtering on subpicture boundaries is conditioned to a per-subpicture flag loop_filter_across_subpic_enabled_flag[ i ] in the SPS.

[0033] Bitstream extraction and merge operations are supported through subpictures in WC and could for instance comprise extracting one or more subpictures from a first bitstream, extracting one or more subpictures from a second bitstream, and merging the extracted subpictures into a new third bitstream.

[0034] 11. Loop Filtering in WC

[0035] WC contains three in-loop filters that are not based on neural networks: A deblocking filter, a sample adaptive offset (SAO) filter, and an adaptive loop filter (ALF). The deblocking filter is used to remove block artifacts by smoothening discontinuities in horizontal and vertical directions across block boundaries. The deblocking filter uses a block boundary strength (BS) parameter to determine the filtering strength. The BS can have values 0, 1, and 2, where a larger value indicates a stronger filtering. The output of the deblocking filter is further processed by the SAO filter, and the output of SAO is then processed by ALF.

[0036] The output of ALF can then be put into the decoded picture buffer (DPB), which contains decoded pictures that may be used for prediction of subsequently encoded (or decoded) pictures. Since the deblocking filter, the SAO filter and ALF in this way influence the pictures in the DPB used for prediction, they are classified as in-loop filters, also known as loop filters. This means that changes done by the loop filter may influence not only the current picture but future pictures. It is possible for a decoder to further filter the picture in the DPB, but not store the filtered output in the DPB. In contrast to loop filters, such a filter is not influencing future predictions and is therefore classified as a post-processing filter, also known as a postfilter. Postfiltering is generally optional for decoders and thereby not required to be performed for decoder implementations to conform to a standard specification.

[0037] In ECM, the bilateral filter (BIF) has also been added. The filter is carried out in the sample adaptive offset (SAO) loop-filter stage and uses samples from deblocking as input. In ECM, each of the BIF and SAO filters creates an offset per sample, and these are added to the input sample and then clipped.

[0038] Another loop-filter that was considered in the development of WC is the Hadamard filter.

[0039] 12. Virtual Boundaries

[0040] Virtual boundaries in WC are boundaries within pictures where the in-loop filter operations that would apply across the boundaries are disabled. The virtual boundary signaling in WC allows turning off in-loop filtering at signaled positions within the coded pictures that do not have to be aligned with the CTU boundaries. The disabled in-loop filter operations include all of deblocking, SAO, and ALF, across the boundary. In WC, there may be at most 3 virtual boundaries in the horizontal and 3 virtual boundaries at the vertical directions and the locations of virtual boundaries are signalled either in the SPS or in the PH. A virtual boundary in WC must be placed on an 8x8 luma sample grid. In one example, for coding 360° video, when the 360° video uses a particular projection format that introduces discontinuities, virtual boundaries allow disabling of in-loop filtering across these boundaries without the need for content scaling to align projection discontinuities and CTU boundaries. Other use cases for virtual boundaries are region-of-interest (ROI) tiles and the gradual decoding refresh (GDR) feature.

[0041] 13. Gradual decoding refresh

[0042] WC supports gradual decoding refresh (GDR) as an alternative to Intra picture refresh. In the bitstream, GDR is indicated by a GDR picture and a delta picture order count (POC) value. The delta POC value is used to identify when the video is fully refreshed. The first picture that is fully refreshed is called the recovery point picture. There may be pictures inbetween the GDR picture and the recovery point picture. These pictures are called recovering pictures.

[0043] A GDR random-access operation starts by decoding the GDR picture. The GDR picture is typically coded using a mix of Intra and Inter blocks such that the Intra blocks provide a refreshed region, also called “clean area” when decoded. The Inter blocks may refer to decodedpictures that the decoder does not have, but WC specifies that decoding shall proceed anyway. The WC specification says these unavailable decoded pictures should be generated and the sample values of those pictures should be set to a mid value. The decoded unrefreshed area is commonly referred to as “the dirty area”. The border between the refreshed / clean area and the unrefreshed / dirty area may be referred to as the GDR boundary. The encoder constructs the bitstream such that the clean area increases in size by each decoded recovering picture until the recovery point picture.

[0044] When performing a GDR random-access operation, the decoder does not output the GDR picture or recovering pictures to avoid output of unrefreshed pictures. A special case is if the GDR picture is the same picture as the recovery point picture.

[0045] GDR pictures may be provided at regular intervals in a bitstream. This means that a decoder may first perform a GDR random-access operation, and thereafter decode many succeeding GDR pictures without performing random-access on any of them. Note that the samples in the clean area will get the same value when a GDR random-access operation is performed and when the video is continuously decoded. However, the sample values in the dirty area may differ for the two cases.

[0046] When GDR is used in WC, a virtual boundary may be signalled for the GDR boundary. This disables in-loop filtering across the regions which prohibits sample values from a dirty region to affect samples in a clean region through filtering.

[0047] When a GDR random-access has been done, the sample values of the decoded recovery point picture must be identical to the samples values that would be decoded at continuous decoding. It is the responsibility of the encoder to ensure that this is always the case. To do this, the encoder needs to ensure that no information from the dirty area is referred to when decoding clean area sample values, for example in motion compensation and in-loop filtering processes. Typically, Intra coded block columns or block rows are used to progressively refresh the video.

[0048] Virtual boundaries can in this example be used to disable in-loop filtering between the Intra / clean area and the dirty area. Note that for the example above, it would be fine to allow the dirty area to refer to the Intra / clean area but not the other way around. The WCspecification does not distinguish between clean and dirty areas, in-loop filtering is disabled both ways.

[0049] In contrast to VTM, which is the reference software for WC, the ECM experimental software is aware of the clean and dirty areas. This was adopted from proposal JVET-Z0118 (Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO / IEC JTC 1 / SC 29, 26th Meeting, by teleconference, 20-29 April 2022, document JVET-Z0118-r3). ECM doesn’t include any new syntax elements compared to VTM. Instead, it relies on virtual boundary signalling and assumes a left-to-right wipe when GDR is used.

[0050] Proposal JVET-AE0145 (Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO / IEC JTC 1 / SC 29, 31st Meeting, Geneva, CH, 11-19 July 2023, document JVET- AE0145) suggested to modify the ECM GDR method to add a 1 -bit flag per virtual boundary that would signal what side of the virtual boundary that is dirty and what side that is clean. The presence of this flag would be conditioned on the picture type; if the picture is a GDR picture or a recovering picture, the flag would be present, otherwise it would not be present. Proposal JVET-AE0145 was not adopted into the ECM software.

[0051] 14. Exploration Experiment on Neural Network based Video Coding (NNVC)

[0052] At the 20th JVET meeting it was decided to set up an exploration experiment (EE) on neural network-based (NN-based) video coding. The exploration experiment continued at the subsequent JVET meetings 21 through 35 with many tests: NN-based in-loop filtering, NN-based post filtering, NN-based super resolution, and NN-based intra prediction. The first test is most relevant to this disclosure and is described further.

[0053] The contributions JVET-X0066 (Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO / IEC JTC 1 / SC 29, 24th Meeting, by teleconference, 6-15 October 2021, document JVET-X0066-vl) and JVET-Y0143 (Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO / IEC JTC 1 / SC 29, 25th Meeting, by teleconference, 12-21 January 2022, document JVET-Y0143-v2) are two successive contributions that describe NN-based in-loop filtering. Both contributions use the same NN models for filtering. The NN-based in-loop filter is placed before SAO and ALF and the sample values before the deblocking filter are used as input to the filter. The output of the NN-based in-loop filter is mixed (blended) with the output of the deblocking filter and forwarded as the input to SAO. The purpose of using the NN-based in-loopfilter is to improve the quality of the reconstructed samples. Here it is helpful that the NN model is non-linear. While deblocking, SAO and ALF all contain non-linear elements such as conditions, and are thus not strictly linear, all three of them are based on linear filters. In contrast, a sufficiently large NN model can in principle learn any non-linear mapping and is therefore capable of representing a wider class of functions compared to deblocking, SAO and ALF. In JVET-X0066 and JVET-Y0143, there are four NN models, i.e., four NN-based in-loop filters. In a refined version of that work presented in the contribution JVET-AB0052 (Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO / IEC JTC 1 / SC 29, 28th Meeting, Mainz, DE, 20-28 October 2022, document JVET-AB0052-v2), only two models are used: One model for luma samples and another model for chroma samples.

[0054] Contribution JVET-AD0380 (Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO / IEC JTC 1 / SC 29, 30th Meeting, Antalya, TR, 21-28 April 2023, document JVET-AD0380-v7) proposed a new unified design for the NN-based in-loop filtering, which captures the benefits of previous NN structures. The unified filter has only one NN model, to filter luma and chroma samples and intra and inter pictures. The unified filter takes six inputs: the reconstructed samples of luma and chroma before deblocking (‘rec’), the prediction samples of luma and chroma (‘pred’), the deblocking block boundary strength (BS) information of luma and chroma (‘bs’), the quantization parameter for a sequence (‘QPbase’), the quantization parameter for each slice ('QPslice’), as well as information on whether a particular sample was intra-predicted, uni-predicted or bi-predicted (TPB’). These inputs first go through a convolutional layer (3x3 or 1x1) and a parametric rectified linear unit (PReLU) layer separately, then they are concatenated and fused together with a 1x1 convolutional layer.

[0055] The NN-based in-loop filters presented in JVET-X0066, JVET-AB0053, JVET- AB0052, and JVET-AD0380, increase the compression efficiency of the codec substantially, i.e., they lower the bit rate substantially without lowering the objective quality as measured by MSE- based PSNR. Increases in compression efficiency, typically referred to simply as “gain”, are often measured as the Bjontegaard-delta rate (BDR) against an anchor. As an example, a BDR of -1% means that the same peak signal-to-noise ratio (PSNR) distortion can be reached with 1% bitrate saving on average. As reported in JVET-AF0041 (Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO / IEC JTC 1 / SC 29, 32nd Meeting, Hannover, DE, 13-20 October 2023, document JVET-AF0041-v3), for the random access (RA) configuration, the BDR for theluma component (Y) is -10.27%, and for the all-intra (Al) configuration, the BDR for the luma component is -7.86%. The complexity of NN models used for compression are often measured by multiply-accumulate operations per pixel (MAC / pixel). The high bitrate savings of an NN model is typically directly related to the complexity of the NN model. The model described in JVET-AF0041 has a complexity of 477 kMAC / pixel, i.e., 477,000 multiply-accumulate operations per pixel. There are also other measures of complexity, such as total model size in terms of stored parameters.

[0056] To further investigate the tradeoff between the compression efficiency and complexity of the NN-based in-loop filters, new operation points have been introduced. The two new operation points are: (1) a low operation point (LOP), with a complexity of about 17 kMAC / pixel, and (2) a very low operation point (VLOP), with a complexity of about 5 kMAC / pixel.

[0057] A Low Operation Point (LOP) architecture NN loop filter used in a JVET EE test (EE1-1.0) is illustrated in figure 1 from JVET-AG2023 (Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO / IEC JTC 1 / SC 29, 33rd Meeting, by teleconference, 17-26 January 2024, document JVET-AG2023-v4). In figure 1 from JVET-AG2023 the input patch size is 144x144 and there is a final cropping step that crops 8 pixels from each side of the output luma patch and 4 pixels from each side of each chroma patch (which has half the size of the luma patch). Hence the final patch size after cropping in the output is equal to 128x128 for luma and 64x64 for the chroma branch. The complexity of this current model is about 17 kMAC per sample. This means that about 17,000 multiplications are performed for calculating one sample value in the output of neural network loop filter architecture in figure 1 of JVET-AG2023.

[0058] The architecture of VLOP is similar to LOP, but the complexity is only about 5 kMAC / pixel. Compared to LOP, VLOP has a smaller number of backbone blocks and channels.

[0059] In JVET standardization, development and study of codecs is done using common test conditions (CTC). The CTC specifies how a codec under test should be configured and what test sequences should be used. Keeping the test conditions static enables apples-to-apples evaluations, but with the drawback that configurations outside the CTC are not tested much and the codec performance may become too optimized towards the CTC.

[0060] Because the luma and chroma components of the sample values are usually input to the neural network as separate input channels, it is common to refer to luma and chroma component as luma and chroma channels as well. The terms luma or chroma channels and luma or chroma components may be hence used interchangeably in this document.

[0061] 15. NNVC 10.0 syntax and semantics

[0062] The focus of this section is on the NN architecture in NNVC-10.0, which is the current available version of the software. Note that there is no specification text containing the syntax and semantics shown in this section, but the following is based on the NNVC software available on the following HHI repository https: / / vcgit.hhi.fraunhofer.de / jvet-ahg- nnvc / WCSoftware_VTM where the NNVC-10.0 software is tagged as vlOrc.

[0063] 15.1. Sequence parameter set

[0064] The syntax table below is derived from NNVC-10.0 when NN_LF_UNIFIED=1.

[0065] The semantics of the syntax elements shown above are described below.

[0066] sps_nnlf_enabled_flag equal to 1 specifies that NN in-loop filtering is enabled for the CLVS. sps nnlf enabled flag equal to 0 specifies that NN in-loop filtering is disabled for the CLVS.

[0067] sps_nnlf_model_id equal to 0 specifies that NN in-loop filtering set 0 may be used, sps nnlf model id equal to 1 specifies that NN in-loop filtering set 1 may be used, sps nnlf model id equal to 2 specifies that the LOP1 in-loop filtering may be used, sps nnlf set equal to 3 specifies that the HOP or LOP in-loop filtering may be used, sps nnlf model id equal to 4 specifies that LOP3 in-loop filtering may be used and sps nnlf model id equal to 5 specifies that HOP4 in-loop filtering may be used.

[0068] sps_nnlf_unified_infer_size_base specifies the base inference size of NN-based in-loop filter. When not present in the bitstream the value of sps nnlf unified infer size base is inferred to be equal to 128.

[0069] sps_nnlf_unified_inf_size_ext specifies the extension of inference size of NN- based in-loop filter. When not present in the bitstream the value of sps nnlf unified inf size ext is inferred to be equal to 8.

[0070] sps_nnlf_unified_max_num_prms specifies the number of the conditional parameters of the NN-based in-loop filter. When not present in the bitstream the value of sps_nnlf_unified_max_num_prms is inferred to be equal to 2.

[0071] sps_nn_intra_pred_enabled_flag equal to 1 specifies that NN intra prediction tool is enabled for the CLVS. sps_nn_intra_pred_enabled_flag equal to 0 specifies that NN intra prediction tool is disabled for the CLVS.

[0072] Further in the NNVC-10.0 software, if the slice is an I slice, the patch size will be set equal to sps_nnlf_unified_infer_size_base«l where «1 is a bit left-shift operation doubling the signalled value. Further in the NNVC-10.0 software, if the slice is not an Lslice, for QP<29 the patch size will be set equal to sps nnlf unified infer size base, and if QP>29, the patch size will be set equal to sps nnlf unified infer size base if the picture width is smaller than 823, and to sps_nnlf_unified_infer_size_base«l for wider pictures, i.e., if the picture width is larger than or equal to 823.

[0073] 15.2. Slice header

[0074] The syntax table below for the slice header is derived from NNVC-10.0.

[0075] The semantics of the syntax elements shown above are described below.

[0076] slice_nnlf_unified_mode equal to 0 specifies that no filtering is done for the slice, slice nnlf unified mode equal to 1 specifies that filtering is done with prmld equal to 0 forall NN blocks in the slice. slice_nnlf_unified_mode equal to 2 specifies that filtering is done with prmld equal to 1 for all NN blocks in the slice, slice nnlf unified mode equal to 3 specifies that prmld is decoded for each NN block in the slice.

[0077] slice_nnlf_unified_scale_flag equal to 1 specifies that there are scaling factors in the bitstream for NN in-loop filtering, slice nnlf unified scale flag equal to 0 specifies that no scaling factor for NN in-loop filtering is signalled in the bitstream.

[0078] y_nnScale specifies the Y component of the NN in-loop filter scale factor.

[0079] cb_nnScale specifies the Cb component of the NN in-loop filter scale factor.

[0080] cr_nnScale specifies the Cr component of the NN in-loop filter scale factor.

[0081] 15.3. Residue scaling

[0082] In the NNVC-10.0 software, when the NN loop filter is applied to reconstructed pictures, a scaling factor is derived and signaled for each color component in the slice header (see the syntax table above). The derivation is based on a least square method. The difference between the input samples and the NN filtered samples (residues) are scaled by the scaling factors before being added to input samples.SUMMARY

[0083] Certain challenges presently exist. For instance, in the ongoing exploration experiments on NN based video coding (EE1) in JVET, the high-level syntax aspect for NN loop filters seems currently underexplored. EE1 looks into few NN loop filter design points (operation points) and architectures (e.g., HOP, LOP, and VLOP, models), each taking in inputs and providing output in particular formats (see for instance item 14 above for some examples) while interacting with other parts of — or mechanisms in — the encoder and decoder using limited defined options.

[0084] At the current state it seems that there is a lack of high-level syntax mechanisms for governing the cases where an NN based loop filter interacts with other known concepts or mechanisms in block-based image or video codecs. Some of these concepts or mechanisms are picture blocks and units (CTU, CU, decoding unit (DU), etc.), picture partitioning mechanisms (slices, tiles, subpictures, etc.), wavefront coding, virtual boundaries, other existing loop filters, etc. As a result, the interaction of the NN loop filters with other concepts and mechanisms inplace may not be well exploited, for instance towards higher coding gains, lower computational complexity, or codec configurations flexibility.

[0085] Moreover, some of the high-level signalling aspects of the state-of-the-art NN loop filters, such as the ones under exploration in JVET EE1, have more of an experimental nature. Hence these aspects may not be flexible, efficient, or diverse, enough to fully utilize the benefits of the NN loop filters.

[0086] For instance, for the weighting mechanism between the outputs from the deblocking filters and NN based loop filters, a weight factor is signalled for each NN block so that the output from the blending operation is R=w X RNN + ( 1 - w) X RDB where R is the result after blending, RNN and RDB are the results from the NN in-loop filter and the deblocking filter, respectively, and w is a weight factor from the set of predefined weight factors {0, 0.25, 0.5, 0.75, 1}. The available choices of selecting weight factors from a limited set of weight factors, and to signal only one weight factor per NN block, are not designed for flexibility or versatility of the codec, or to optimize performance.

[0087] Accordingly, in one aspect there is provided a method for decoding a picture from a bitstream. The method includes obtaining weighting information including at least a first weighting information component and a second weighting information component specifying one or more weight factors specifying a weighting between first-NNLF-output and non-first- NNLF-output. The one or more weight factors include at least a first weight factor and the first- NNLF-output is produced using a first neural network (NN) loop filter (NNLF), based on an NN block of an input block of the picture. The method also includes decoding the picture using the first and the second weighting information components.

[0088] In another aspect there is provided a method for producing a bitstream. The method includes obtaining weighting information including at least a first weighting information component and a second weighting information component specifying one or more weight factors specifying a weighting between first-NNLF-output and non-first-NNLF-output. The one or more weight factors include at least a first weight factor, and the first-NNLF-output is produced using a first NNLF, based on an NN block of an input block of a picture. The method also includes including the weighting information in the bitstream.

[0089] In some aspects, there is provided a computer program comprising instructions which when executed by processing circuitry of an apparatus causes the apparatus to perform any of the methods disclosed herein. In one embodiment, there is provided a carrier containing the computer program wherein the carrier is one of an electronic signal, an optical signal, a radio signal, and a computer readable storage medium. In another aspect there is provided an apparatus that is configured to perform the methods disclosed herein. The apparatus may include memory and processing circuitry coupled to the memory.

[0090] An advantage of embodiments disclosed herein is that they improve performance, decrease complexity, increase flexibility, and / or decrease energy consumption, of the codec by regulating the interaction of the NN loop filter with the rest of the codec. For example, the applicability of the NN loop filters is extended, in some embodiments, to more pictures or parts of pictures in the bitstream by designing the mechanisms that are required but not existing yet for applicability of the NN loop filters to those extra pictures or picture areas. This is beneficial because NN loop filters have shown a good potential in improving the compression efficiency in image and video codecs.

[0091] In addition, regulating, harmonizing, and facilitating, the interaction of the NN loop filter with other concepts and mechanisms in a block-based image or video codec through high-level mechanisms improve the overall performance of the codec by more timely usage of the NN loop filters and usage of the NN loop filters with the right input and settings. The proposed solutions could also simplify the architecture and / or affect the training of the NN loop filter indirectly.

[0092] Performance and flexibility are improved by, for example, allowing, in some cases, a finer granularity of the weighting between NN loop filter output and second / deblocking output, e.g., a per CU weighting instead of always per CTU weighting between the NN loop filter output and second / deblocking output, which may in turn allow for a better compression tradeoff for some contents.BRIEF DESCRIPTION OF THE DRAWINGS

[0093] The accompanying drawings, which are incorporated herein and form part of the specification, illustrate various embodiments.

[0094] FIG. 1 illustrates a system according to an embodiment.

[0095] FIG. 2 is a schematic block diagram of an encoder according to an embodiment.

[0096] FIG. 3 is a schematic block diagram of a decoder according to an embodiment.

[0097] FIG. 4A is a schematic block diagram of a loop filter stage (LFS) according to an embodiment.

[0098] FIG. 4B is a schematic block diagram of at least a portion of a loop filter stage (LFS) according to an embodiment.

[0099] FIG. 4C is a schematic block diagram of at least a portion of a loop filter stage (LFS) according to an embodiment.

[0100] FIG. 4D is a schematic block diagram of at least a portion of a loop filter stage (LFS) according to an embodiment.

[0101] FIG. 5 illustrates an input block and an input patch according to an embodiment.

[0102] FIG. 6 illustrates a picture having multiple NN blocks according to an embodiment.

[0103] FIG. 7 illustrates weights being assigned to an NN block positioned a distance D from a subpicture boundary according to an embodiment.

[0104] FIG. 8 illustrates weights being assigned to an NN block positioned a distance D from a subpicture boundary according to an embodiment.

[0105] FIG. 9 illustrates how different weight factors can be signaled and corresponding gradient patterns for the weight factors.

[0106] FIG. 10 is a flowchart illustrating a process according to an embodiment.

[0107] FIG. 11 is a flowchart illustrating a process according to an embodiment.

[0108] FIG. 12 is a block diagram of an encoding apparatus according to an embodiment.DETAILED DESCRIPTION

[0109] Terminology:

[0110] The following terminology is used herein:

[0111] FIG. 1 illustrates a system 100 according to an embodiment. System 100 includes an encoder 102 and a decoder 104. In some embodiments, encoder 102 is in communication with decoder 104 via a network 110 (e.g., the Internet or other network). Encoder 102 encodes a source video sequence 101 into a bitstream comprising an encoded video sequence and may transmit the bitstream to decoder 104 via network 110. In some embodiments, encoder 102 is not in communication with decoder 104, and, in such an embodiment, rather than transmitting bitstream to decoder 104, the bitstream is stored in a data storage unit 190 and decoder 104 retrieves from the data storage unit 190 the bitstream containing the encoded video sequence. Data storage unit 190 may be collocated with encoder 102 or may be remote from encoder 102.

[0112] Decoder 104 decodes the pictures included in the encoded video sequence to produce decoded video data for display and / or post processing (e.g., a machine vision task). Accordingly, decoder 104 may be part of a device 103 having an image processor 105 (e.g., a postfilter or other image processor) and / or a display 106. The image processor 105 may perform machine vision tasks on the decoded pictures. One such machine vision task may be identifying an object in the picture and creating a 3D model of the object. The image processor 105 may also comprise or consist of a postfilter, such as a neural network postfilter (NNPF), that that may receive reconstructed pictures from decoder 104 and that may process the reconstructed pictures using a specified postfilter. In the embodiment shown, processor 105 is separate from decoder 104, but in other embodiments, processor 105 may be a component of decoder 104. The device 103 may be a mobile device, a set-top device, a head-mounted display, or any other device.

[0113] FIG. 2 illustrates functional components of encoder 102 according to some embodiments. It should be noted that encoders may be implemented differently so implementation other than this specific example can be used. Encoder 102 employs a subtractor 241 to produce a residual block which is the difference in sample values between an input block and a prediction block (i.e., the output of a selector 251, which is either an inter prediction block output by an inter predictor 250 (a.k.a., motion compensator) or an intraprediction block output by an intra predictor 249). Then a forward transform 242 is performed on the residual block to produce a transformed block comprising transform coefficients. A quantization unit 243 quantizes the transform coefficients based on a quantization parameter (QP) value (e.g., a QP value obtained based on a picture QP value for the picture in which the input block is a part and a block specific QP offset value for the input block), thereby producing quantized transform coefficients which are then encoded into the bitstream by encoder 244 (e.g., an entropy encoder) and the bitstream with the encoded transform coefficients is output from encoder 102. Next, encoder 102 uses the quantized transform coefficients to produce a reconstructed block. This is done by first applying inverse quantization 245 and inverse transform 246 to the transform coefficients to produce a reconstructed residual block and using an adder 247 to add the prediction block to the reconstructed residual block, thereby producing the reconstructed block, which is stored in the reconstructed picture buffer (RPB) 266. Loop filtering by a loop filter stage (LFS) 267 is applied and the final decoded picture is stored in a decoded picture buffer (DPB) 268, where it can then be used by the inter predictor 250 to produce an inter prediction block for the next picture to be processed. As illustrated in FIG. 4, LFS 267 may include multiple sub-stages: i) a deblocking filter 402, ii) an NN filter 404 (a.k.a., “NN loop filter (NNLF)”), iii) a sample adaptive offset (SAO) filter 406, and iv) an adaptive loop filter (ALF) 408.

[0114] FIG. 3 illustrates functional components of decoder 104 according to some embodiments. It should be noted that decoder 104 may be implemented differently so implementations other than this specific example can be used. Decoder 104 includes a decoder module 361 (e.g., an entropy decoder) that decodes from the bitstream quantized transform coefficient values of a block. Decoder 104 also includes a reconstruction stage 398 in which the quantized transform coefficient values are subject to an inverse quantization process 362 and inverse transform process 363 to produce a residual block. This residual block is input to adder 364 that adds the residual block and a prediction block output from selector 390 to form a reconstructed block. Selector 390 either selects to output an inter prediction block or an intra prediction block. The reconstructed block is stored in an RPB 365. The inter prediction block is generated by the inter prediction module 350 and the intra prediction block is generated by the intra prediction module 369. Following the reconstruction stage 398, an LFS (loop filter stage) 367 applies one or more filters to the reconstructed blocks and the final decoded picturemay be stored in a DPB 368 and output to post processor 105 for post processing and / or display 106. Pictures are stored in the DPB for two primary reasons: 1) to wait for picture output and 2) if the picture is a reference picture, to be used for reference when decoding future pictures. In some embodiments, post processor 105 may receive the reconstructed picture before all loop filters are applied.

[0115] FIG. 4A illustrates LFS 267 and LFS 367 according to one embodiment. In the embodiment shown, the LFS includes: i) a deblocking filter 402, ii) an NNLF 404, iii) a sample adaptive offset (SAO) filter 406, and iv) an adaptive loop filter (ALF) 408. As shown in FIG. 4A, NNLF 404 is placed before SAO 406 and ALF 408 and the sample values before the deblocking filter 402 are used as input to NN-based filter 404. In this embodiment, the output of NN-based filter 404 is mixed (blended) with the output of deblocking filter 402 and forwarded as the input to SAO 406. Using NN-based filter 404 in this way serves to improve the quality of the reconstructed samples. In some embodiments, the blending of the outputs from deblocking filter 402 and NN filter 404 is expressed as: R = w x RNN + ( 1 - w) x RDB, where R is the result after blending, RNN and RDB are the outputs from NN filter 404 and deblocking filter 402, respectively, and w is a weight factor from a set of predefined weights. The weight factor may be signalled per NN block.

[0116] FIG. 4B, FIG. 4C, and FIG. 4D, illustrate other embodiments of LFS 267, 367. As shown in FIG. 4B, the output of NNLF 404 may be blended with the input to NNLF 404. As shown in FIG. 4C, the output of NNLF 404 may be blended with the input to NNLF 404 and an output of another filter. As shown in FIG. 4D, the output of NNLF 404 may be blended with the output of deblocking filter 402 to produce first blended output, and the first blended output may be blended with the input to NNLF 404.

[0117] As described above, a challenge presently exists. For instance, in the ongoing exploration experiments on NN based video coding (EE1) in JVET, the high-level syntax aspect for NN loop filters seems currently underexplored. Accordingly, in some embodiments, this disclosure introduces novel syntax elements and methods for using those novel syntax elements for governing the usage of the NN-based loop filter and its interaction with other tools, mechanisms, and parts of the codec. In one embodiment, the weighting between the output produced from an NN in-loop filter and the output produced from another filter, e.g., deblockingor SAO, is flexible. This is achieved by, for example, specifying extra weighting information components in comparison to the prior art, for instance by one or more of specifying additional weight factors for a luma or chroma channel of one NN block, specifying weight factor granularity, specifying the values a weight factor may have, using varying weight or a weighting function for values in the NN block, deriving one or more weight factors based on the position of the NN block in relation to a boundary such as a picture boundary. The extra weighting information components could be derived or explicitly signalled in the bitstream. One nonlimiting example of information being explicitly signalled in the bitstream is when a syntax element in the bitstream exclusively represents a weighting information component value, and that is the sole purpose of that syntax element.

[0118] FIG. 5 illustrates an example input block 502 of a picture 600 (see FIG. 6) and an example input patch 508 corresponding to input block 502. Input block 502 includes an NN block 504 and a border area 506 that surrounds NN block 504. Some portion of the border area 506 of input block 502 may extend beyond picture 600 (see FIG. 6).

[0119] Input patch 508 includes a central area 510 and a margin 512 that surrounds central area 510. At least some of the values for input patch 508 are derived using the sample values of input block 502. Input patch 508 will be processed by NNLF 404.

[0120] FIG. 6 illustrates input block 502 in relation to picture 600 to which the input block 502 belongs. As shown in FIG. 6, picture 600 may for example be divided into multiple NN blocks. In FIG. 6, the NN-blocks are marked with solid lines, the input block area is marked with dotted lines, and the picture is marked in dashed lines. There could also be other ways of dividing the picture and extracting NN blocks, e.g., the NN blocks may be sparsely distributed, or they may be overlapping.

[0121] An input block, an NN block, and an input patch, may have multiple channels, e.g., one for luma and two for chroma. An input block, an NN block, and an input patch, may alternatively only comprise one of the channels. The size (or resolution) of the channels could be different, for instance the luma channel may have the size MxN but the chroma channels may have the size M / 2 x N / 2.

[0122] 1. Blending Sample Values using a Weight Factor

[0123] Conventionally, a weight factor w is signalled for each NN block so that the final filtered samples R are generated by blending the result of the deblocking filter and NNLF 404 according to: R = w * RNN + ( 1 - W ) X RLF, where RNN is the block of sample values output from NNLF 404, RLF is the block of sample values output from a loop filter (e.g., the DB filter or the ALF) other than NNLF 404, and w is a weight factor selected from a set of predefined weight factors { 0, 0.25, 0.5, 0.75, 1 }. As an example, when w is equal to 0.5, R is set to the sum of half RNN and half RLF. AS another example when w is equal to 0.25, R is set to the sum of 25 percent RNN and 100 - 25 =75 percent RLF. Conventionally, the weight factor for an NN block is sent separately for different luma and chroma channels. Hence the weight factor for the luma channel wycan be different than the weight factor for the chroma channels web and wcr for one NN block. In this disclosure, sending one weight factor per Y, Cb, and Cr, channels may also be referred to as sending one weight factor with three components.

[0124] In this disclosure, the weighting between the NN loop filter output, the deblocking output, and the no-filter (unfiltered) output, is more flexible compared to the state of the art.

[0125] In the embodiments below, the sample values (a.k.a., output patch) from a first NNLF (e.g., NNLF 404) are referred to as the first-NNLF-output. Sample values that are produced without involvement of the first NNLF are referred to as non-first-NNLF-output. The embodiments below include descriptions of various methods for how to combine first-NNLF- output and one or more non-first-NNLF-outputs. Non-first-NNLF-output may here be an output produced by one or more non-NN filters, that may comprise one or more of: a deblocking filter, an ALF, an SAO filter, a bilateral filter, a Hadamard filter, an AVI filter, an AV2 filter, or any other non-NN filter. Non-first-NNLF-output may also be an output produced by no in-loop filters at all (using unfiltered reconstruction). Non-first-NNLF-output may also be an output produced by a second NNLF.

[0126] In the embodiments below, weighting information is used to blend first-NNLF- output with non-first-NNLF-output (e.g., with first non-NNLF-output such as the output from deblocking filter 402 as shown in FIG. 4A). Weighting information consists of one or more weighting information components such as those specified below: a weight factor, an offset for a weight factor,- a relative weight factor,- a weight factor granularity,- a candidate list of weight factors,- an indicator in a candidate list of weight factors, information specifying a weight factor granularity, information specifying values a weight factor can have,- a function defining the transition of a weight factor over an area of sample,- a function specifying the blending of the first-NNLF-output and the non-first-NNLF - output.- a rule or constraint that specifies if and / or how to apply a weight factor, information specifying if a weight factor is present in the bitstream,- an indicator indicating whether to do a weighting of the filtering or not,- a boundary condition such as the distance of the NN block to a picture boundary being smaller than a threshold value.

[0127] Alternatively, the weighting information may consist of any information used for determining a weight factor for use in blending first-NNLF-output and non-first-NNLF -output.

[0128] 2. More than one weight factor for a channel of an NN block

[0129] In one embodiment, the weighting information specifies one or more weight factors specifying the weighting between first-NNLF-output and non-first-NNLF -output, wherein the weighting information includes at least a first weighting information component and a second weighting information component.

[0130] In one embodiment, the weighting information specifies at least two weight factors. In this embodiment, the at least two weight factors (a first weight factor and a second weight factor, signalled or derived) are assigned to an NN block for at least one of the luma or chroma channels for the NN block. In this version of the embodiment, the first weight factor is applied to a first set of one or more sample values in the NN block (i.e., a first sub-block of theNN block) and the second weight factor is applied to a second set of one or more sample values in the NN block (i.e., a second sub-block of the NN block).

[0131] In one embodiment, the NN block consists of two or more NN sub-blocks (or simply “sub-blocks” for short) and one weight factor per each sub-block is assigned as illustrated in FIG. 7, which shows a first weight factor (wl) assigned to a first sub-block of an NN block 700 and a second weight factor (w2) assigned to a second sub-block of NN block 700.

[0132] The higher granularity of the weight factors may improve compression. That is, for example, weight factors being tailored to smaller areas may improve compression over the case of one weight factor being applied to a larger area. In one example, the first-NNLF-output is better than the non-first-NNLF-output for a first sub-block, but the non-first-NNLF -output is better compared to the first-NNLF-output for a second sub-block. In this scenario having separate weight factors for each of the first and second sub-blocks allows choosing a higher weight factor for the first-NNLF-output for the first sub-block and a lower weight factor for the first-NNLF-output for the second sub-block, which can improve the overall performance.

[0133] The weight factors for each sub-block may be signalled in the bitstream or derived. When the weight factors are signalled the signalling may be explicit One non-limiting example of a weight factor being explicitly signalled in the bitstream is when a syntax element in the bitstream exclusively represents the weight factor, and that is the sole purpose of that syntax element. As another example, an index corresponding to a specific weight factor for each subblock is signalled.

[0134] In one example, a first NN block is aligned with a CTU which consists of a first and a second CU, and the first NN block has a first and a second sub-block which are aligned with the first and second CUs. In this example two weight factors wl and w2 are signalled, corresponding to the first and second sub-blocks, respectively.

[0135] In this example the final filtered samples in a sub-block RSB are calculated according to the weight factor for that particular sub-block, with the weight operation being RSB = WSBXRNN SB + ( 1 - WSB )XRDB SB, where RNN SB is a sub-block of sample values output from NNLF 404 (e.g., a sub-block of first-NNLF-output), RDB SB is a corresponding sub-block of sample values output from a filter other than NNLF 404 (e.g., a sub-block of non-first-NNLF-output), and WSB is the weight factor for the particular sub-block, so WSB = wi for the first subblock and WSB = W2 for the second sub-block..

[0136] In one embodiment, a weight factor for a sub-block is derived based on: a sample value in the sub-block, a neighbouring sub-block’s weight factor, and / or a collocated sub-block’s weight factor. A collocated sub-block may be a sub-block from the current picture, for instance a sub-block from an NN block other than the current NN block in the current picture, or a subblock from a picture other than the current picture for instance a reference picture. In one example, a first weight factor is signalled for a first sub-block of the NN block and a second weight factor for a second sub-block of the NN block is derived by adding a value to (or subtracting a value from) the first weight factor.

[0137] In another version, a list of candidate weight factors is produced for each subblock and an index for an entry in the candidate list is signalled for each sub-block. The list may be ordered such that an early entry in the list is more efficient to signal, e.g., by using ue(v) Exp- Golomb coding.

[0138] A list of candidate weight factors may be derived from neighbouring or collocated sub-blocks, either directly as the weight factors used or as a blend between two neighbouring or collocated sub-blocks, e.g., as a blend between the weight factor of the neighbouring sub-block located to the bottom left of the current sub-block and the weight factor of the sub-block located at the top right of the current sub-block.

[0139] In one embodiment, a global weight factor is signalled (e.g., for the NN block which is aligned to a CTU) and a local weight factor (e.g., for a sub-block of the NN block, such as a CU in that CTU) is only signalled if it is different from the global weight factor. In one embodiment, a flag is signalled indicating whether a separate weight factor for a current subblock is signalled or not. In one embodiment, the global weight factor is signalled for the picture or a group of pictures and the local weight factor is signalled for a picture, an NN block, or a sub-block.

[0140] In one embodiment, dividing an NN block into sub-blocks is done not based on the location of the samples in the NN block, but rather based on the sample values themselves. For example, an NN block with values in the range from 0 to 255, that may be represented with 8 bits, may be divided into two sub-blocks: a first sub-block that consists of all sample values ofthe NN block that are in the range 0-10 or 245-255 and a second sub-block that consists of all sample values of the NN block that are in the range of 11-244. In this embodiment, a first weight factor wl is assigned to the first sub-block and a second weight factor w2 is assigned to the second sub-block. That is, in one example, weight factor wl is applied to the sample values in the range of 0-10 and 245-255 and weight factor w2 is applied to the sample values in the range of 11 -244 in the NN block.

[0141] In one embodiment the number of weight factors for one NN block Nwis specified. This number Nwmay be set by default, or derived or signalled in the bitstream. In one example Nw=2 means that two weight factors for an NN block are specified. In another variant, Nw.max specifies the maximum number of the weight factors that can be specified for one NN block. In one example, Nw,max=2 means that maximum 2 weight factors are specified for an NN block.

[0142] In one embodiment, the number of weight factors for an NN block could be different for luma and chroma channels. For instance, in one NN block, two weight factors are signalled for the luma channel but only one weight factor is signalled for the chroma channels.

[0143] In one embodiment, at least a first and a second weight factor is signalled for a first NN block and for at least one sub-block of the first NN block it is specified which of the at least first and second weight factor that is applied to that at least one sub-block.

[0144] In one embodiment, decoder 104 may perform all or a subset of the following steps to decode a picture:

[0145] Step l:

[0146] Obtaining weighting information for the NNLF to be applied for one NN block, wherein the weighting information comprises at least a first weighting information component and a second weighting information component specifying a weighting between first-NNLF- output and non-first-NNLF-output. For example, the weighting information comprises: a i) first weighting information component specifying a first weight factor for use in blending a first set of samples produced by NNLF 404 using a first input patch generated using the NN block with a first set of samples produced by another filter based on the NN block and ii) second weighting information component specifying a second weight factor for use in blending a second set ofsamples produced by NNLF 404 using a second input patch generated using the NN block with a second set of samples produced by the other filter based on the NN block. In some embodiments, the first input patch and the second input patch may be the same input patch.

[0147] Step 2:

[0148] Decoding the picture using the first and the second weighting information component. For instance, using the example above where the weighting information specifies a first weight factor WSBI for a first sub-set of samples and a second weight factor WSB2 for a second sub-set of samples decoding the picture using the two weight factors (WSBI and WSB2) may include performing the following calculations (where “U” denotes the union operator):RsBl = WSBIxRNN_SB1 + ( 1 - WSBI )xRDB_SB1;RSB2 = WSB2XRNN_SB2 + ( 1 - WSB2 )XRDB_SB2; andR = RSBI U RSB2.

[0149] In another embodiment, decoder 104 may perform all or a subset of the following steps for this embodiment to decode a picture:

[0150] Step l:

[0151] Obtaining weighting information for the NNLF to be applied for one NN block, wherein the weighting information comprises at least a first and a second weighting information component specifying one or more weight factors specifying the weighting between first-NNLF- output and non-first-NNLF-output.

[0152] Step 2:

[0153] Obtaining the one or more weight factors from the first and / or second weighting information component.

[0154] Step 3:

[0155] Decoding the picture using the one or more weight factors.

[0156] In another embodiment, decoder 104 may perform all or a subset of the following steps to decode a picture:

[0157] Step l:

[0158] Obtaining weighting information for the NNLF to be applied for one NN block, wherein the weighting information comprises at least a first weighting information component and a second weighting information component specifying a weighting between first-NNLF- output and non-first-NNLF-output.

[0159] Step 2:

[0160] Obtaining at least a first weight factor and a second weight factor using at least the first weighting information component and / or the second weighting information component, for weighting between first-NNLF -output and non-first-NNLF-output, where the first weight factor is applied to a first value in the NN block and the second weight factor is applied to a second value in the NN block as described above.

[0161] Step 3:

[0162] Decoding the picture using the first and the second weight factors.

[0163] In some embodiments, the steps relating to obtaining a weight factor may be replaced by, or comprise: obtaining a candidate list of weight factors, obtaining an index corresponding to a weight factor in the candidate list of weight factors, and / or selecting the weight factor from the candidate list of weight factors based on the index.

[0164] 3. Weight Factor Granularity

[0165] In this embodiment, a weight factor granularity is obtained (it can be signalled or derived). A weight factor granularity defines a level to which one or more weight factors are applied. For example, a weight factor may be applied to a sub-block, a CU, a particular CU split level, a CTU, an NN block, a slice, a subpicture, a tile, a ROI, a picture, a group of pictures, or a temporal layer.

[0166] In one example of this embodiment, a weight factor granularity syntax element is included in the bitstream and when set to a first value indicates that the weight factor granularity is per NN block (e.g., per CTU if the CTUs are aligned with the NN block) and for each NN block a weight factor is signalled or derived, wherein the weight factor value can vary per NN block. In another example which may be based on the previous example, when set to a second value, the granularity syntax element indicates that the weight factor granularity is per sub-block (e.g., per CU) and as a result for each sub-block a weight factor is signalled or derived.

[0167] In another example, a weight factor is signalled per NN block by default unless it is signalled in the bitstream that a weight factor is signalled per sub-block for the current NN block or for the rest of the NN blocks until a further notice.

[0168] In another example, the weight factor granularity is specified to be up to a particular CU split level, for instance the second level of CU splitting. In this example, for the NN blocks that contain CUs which are split up to the second level of the CU splitting, a separate weight factor is specified, and, if a CU is split further, the same weight factor is applied to all the CUs that are obtained from further splitting that particular CU.

[0169] In another example, which may be based on the previous example, a value for the granularity syntax element indicates that the weight factor granularity is per picture and for each picture a weight factor is signalled. In yet another example, a value for the granularity syntax element indicates that the weight factor granularity is per group of pictures or the sequence (e.g., signalled in PPS or SPS) and for each group of pictures or for the sequence a weight factor is signalled.

[0170] Accordingly, in one embodiment, decoder 104 may perform all or a subset of the following steps to decode a picture:

[0171] Step l :

[0172] Obtain weighting information for the NNLF to be applied for one NN block, wherein the weighting information comprises at least a first and a second weighting information component.

[0173] Step 2:

[0174] Obtain weight factor granularity information from the first and / or second weighting information component.

[0175] Step 3:

[0176] Obtain one or more weight factors from the first and / or second weighting information component.

[0177] Step 4:

[0178] Decode the picture using the weight factor granularity information and the one or more weight factors.

[0179] 4. Values a Weight Factor Can Have

[0180] In this embodiment the weighting information consists of the information specifying values a weight factor can have, e.g., specifying a set of two or more candidate weight factors.

[0181] In one variant of this embodiment the number of the values the weight factor can have is specified (explicitly or implicitly), e.g., the number of candidate weight factors included in the set of candidate weight factors is implicitly specified. For instance, it may be signalled that the weight factor has N steps (or N+l values). In one example weight factors are equally distanced from each other and when N=4 is signalled, values of 0, 0.25, 0.5, 0.75, and 1, are the candidate weight factor values.

[0182] In another variant the distance between at least two consecutive weight factors is specified. In one example it is signalled that values of possible weight factors are 0.25 apart, hence values of 0, 0.25, 0.5, 0.75, and 1, are the possible weight factor values when starting from 0, because a candidate weight factor value must be less than, or equal to, 1.

[0183] In another variant the weight factors are not of equal distance. In one example of this variant N=3 indicates the weight levels equal to 0, 0.25, 0.5, and 1, where the distance between the first and the second weight factor is 0.25, but the distance between the third and fourth weight factors is 0.5.

[0184] In one embodiment, decoder 104 may perform all or a subset of the following steps for this embodiment to decode a picture:

[0185] Step l:

[0186] Obtain weighting information to be applied for at least one NN block, wherein the weighting information comprises at least a first weighting information component specifying values a weight factor can have (e.g., specifying a set of two or more candidate weight factor values).

[0187] Step 2:

[0188] Obtain a first weight factor using the first weighting information component specifying the candidate weight factors.

[0189] Step 3:

[0190] Decoding the picture using the first weight factor for weighting between first- NNLF-output and non-first-NNLF-output.

[0191] 5. Weight Factor Based on one or more Boundary Conditions

[0192] In this embodiment, a weight factor is determined based on one or more boundary conditions, where a boundary may be: a picture boundary, a CTU boundary, a CU boundary, a block boundary, a virtual boundary (e.g., a virtual boundary within a CTU or within a known distance from a CTU, a virtual boundary within an NN block or within a known distance from the NN block,), a boundary of the NN block, a boundary of the input block, a boundary of a tile, slice or subpicture or other picture partitions, a gradual decoder refresh boundary (e.g., a boundary between a refreshed and non-refreshed area) a boundary of a region of interest (ROI) etc.

[0193] A boundary condition may comprise a distance D being smaller or larger than a threshold T, wherein the distance may be:(1) the distance of a boundary to a certain spatial position in the picture where a certain filtering rule happens, e.g., the distance to the picture boundary,(2) the distance to a virtual boundary,(3) the distance to a slice, tile or subpicture boundary with certain filtering rules,(4) etc.

[0194] In one example of this embodiment, which is illustrated in FIG. 7, the right edge of NN block 700 has a distance of D pixels from a subpicture boundary, and the right edge of the second column of NN block 700 has a distance of T pixels from the subpicture boundary, where D < T. In this example, two weight factors are assigned to NN block 700, a first weight factor WSBI (wl in the figure) for all samples within NN block for which the distance from the sample to subpicture boundary is less than T pixels, and a second weight factor WSB2 (W2 in the figure) for all samples within NN block for which the distance from the sample to subpicture boundary is greater than, or equal, to T.

[0195] In another example, illustrated in FIG. 8, there is a smooth weight transition area between WSBI and WSB2, e.g., such that the weight for a pixel in position i, where i is from 1 to N- 1 in the transition area, can be calculated as wpi = (i* WSBI +(N-i)*wsB2) / N. For instance, if WSB2 > WSBI and N = 3, then the weight factors wpl= WSBI +2*(WSB2 - WSBI) / 3 and wp2 = WSBI +(WSB2 - WSBI) / 3 are applied to the pixels in the transition area (compared to WSB2 and WSBI in the example above). In this example, the transition area is the two middle pixel columns in the NN block just before and after the threshold T pixel distance from the subpicture boundary.

[0196] Decoder 104 may perform all or a subset of the following steps for this embodiment to decode a picture:

[0197] Step l:

[0198] Obtain weighting information for the NNLF to be applied for at least one NN block, wherein the weighting information specifies at least one boundary condition.

[0199] Step 2:

[0200] Determine a first weight factor based on the boundary condition.

[0201] Step 3:

[0202] Decoding the picture using the first weight factor for weighting between first- NNLF-output and non-first-NNLF-output.

[0203] 6. Rules for Weight Factors

[0204] In this embodiment, one or more rules are set for the weight factor(s). Rules may be (1) “assertive” rules — i.e., rules that set or modify weight factors if a certain condition holds or (2) “excluding” rules — i.e., rules that set or modify weight factors if certain conditions are not met.

[0205] In one example of this embodiment, a weighting rule is set for NN blocks which are located at the borders of the picture, here called border NN blocks. In this example, the weighting between the first-NNLF-output and the non-first-NNLF-output for border NN blocks is set to 1, which means the border patches only use the first-NNLF-output.

[0206] In a variant of this embodiment, the weighting between the first-NNLF-output and the non-first-NNLF-output may be set to particular values or value ranges, or the NN loop filtering can be entirely turned off for a particular coding scheme such as a low delay coding scheme.

[0207] In one example, the rule is set for patches in relation to a virtual boundary. In this example, the weight factor for the first-NNLF-output is set equal to a first value (e.g., zero) when the NN block contains a virtual boundary, or the NN block is closer to a virtual boundary than a threshold of T pixels. In this example, the first weight factor for the first-NNLF-output may be signalled or be equal to a default value.

[0208] Another example of a rule for the weight factors is to apply a first set of weight factors if the NN block is from an intra coded CTU or CU, and to apply a second set of weight factors if the NN block is from an inter coded CTU or CU. In this example the first and the second set of weight factors could be predefined or signalled.

[0209] In another embodiment, a rule is set that specifies a delta value to be added or subtracted from a signalled or otherwise determined weight factor. In one example, the weighting between the first-NNLF-output and the non-first-NNLF-output is determined as WSB, but the rules specifies that if condition Cl holds, then the actual weight factor that is going to be applied is wsB+dl.

[0210] Decoder 104 may perform all or a subset of the following steps for this embodiment to decode a picture:

[0211] Step 1:

[0212] Obtain weighting information for the NNLF to be applied for at least one NN block, wherein the weighting information specifies at least a first weight factor rule for use in determining a first weight factor.

[0213] Step 2:

[0214] Use the first rule to determine the first weight factor.

[0215] Step 3:

[0216] Decode the picture using the first weight factor for weighting between first- NNLF-output and non-first-NNLF-output.

[0217] 7. Blending Function

[0218] In this embodiment a blending is performed between the first-NNLF-output and the non-first-NNLF-output, and the blending is defined using a function. In one embodiment, the blending is a function of the position of the pixels in the NN block.

[0219] In one example, a 2-bit blending function is signalled where the first bit specifies a horizontal gradient, and the second bit specifies a vertical gradient for blending of first-NNLF- output and the non-first-NNLF-output.

[0220] In another example, the blending function is signalled with 4 bits, where 2 bits are used for the horizontal dimension, and the other 2 bits are used for the vertical dimension. In one implementation, the four values of the 2-bit signalling refer to the change of the weights according to the table shown in FIG. 9, where white corresponds to the minimum weight factor value, and black corresponds to the maximum weight factor value in the range, and grey values are weight factors in between. FIG. 9 shows the values for one dimension (horizontal or vertical) and the gradient patterns can be combined with the other dimension to create 16 different weight patterns in 2D.

[0221] In one example of this embodiment the weight factor assigned to the non-first- NNLF-output is equal to 1 minus the weight factor assigned to the first-NNLF-output. In another embedment, this is not the case. For example, the first-NNLF-output may contain a feedforward branch which carries the non-filtered sample values to be added to the NN-filtered delta, thereby internally blending the first-NNLF-output with the non-first-NNLF-output within the NN. In case the first-NNLF-output does not contain the feedforward branch, the non-filtered samplevalues could be part of an external blending process, by waiting for the output from the NN and use the weights to blend the first-NNLF-output and the non-first-NNLF-output. In that case the weight factor assigned to the non-first-NNLF-output has two parts, one part which is the same as the first-NNLF-output (which is applied to the non-filtered sample values) and the other part equal to 1 minus the weight factor applied to the rest of the non-first-NNLF-output.

[0222] In one variant, the weighting is done linearly across the transition area. In another version of the embodiment the filtering across the transition area is done in a non-linear way, for instance using a discrete S-shaped curve function such that the weighting is proportionally higher closer to the start and end of the function, e.g., the weight factors for the first-NNLF-output becomes {1, 6 / 7, 4 / 7, 3 / 7, 1 / 7, 0} for a transition area of six pixels.

[0223] In another variant of this embodiment there is a base line weight factor signalled or set as default and the weight factor differences to that base line weight factor are signalled.

[0224] In one version of this embodiment, the weight function is a piecewise linear function with the min and max values as the break points of the piecewise linear function. In yet another version of this embodiment the weight function is a piecewise function, but one or more pieces of the function are non-linear.

[0225] Decoder 104 may perform all or a subset of the following steps for this embodiment to decode a picture:

[0226] Step l :

[0227] Obtain weighting information for the NNLF to be applied for at least one NN block, wherein the weighting information specifies a function to determine a blending between the first-NNLF-output and the non-first-NNLF output.

[0228] Step 2:

[0229] Use the function to determine one or more weight factors for blending between the first-NNLF-output and the non-first-NNLF output.

[0230] Step 3:

[0231] Decode the picture using the determined one or more weight factors

[0232] 8. Weight Factor Based on Block Size

[0233] In this embodiment, the size of the NN block determines the weighting between the first-NNLF-output and the non-first-NNLF-output. In one example, there are weights signalled for NN block sizes larger than a threshold, but for NN block sizes smaller than the threshold no weight factor is signalled and the weight factors are set to fixed values, such as zero for the first-NNLF-output and 1 for the non-first-NNLF-output.

[0234] In one embodiment, there is an NN block size threshold that specifies if the weighting is signalled or not, and the threshold itself is either derived or signalled. In one example the threshold used for determining whether the weight factor is signalled or not is derived from one or more of: (1) a tier and / or a level of the codec or as signalled in the bitstream; (2) a picture size (e.g., resolution); and (3) the number of CUs or NN blocks or patches in the picture.

[0235] Decoder 104 may perform all or a subset of the following steps for this embodiment to decode a picture:

[0236] Step l :

[0237] Obtain weighting information for the NNLF to be applied for at least one NN block, wherein the weighting information specifies a size (e.g., a width, a height, an area, an aspect ratio) or a size threshold of the NN block.

[0238] Step 2:

[0239] Use the size or the size threshold of the NN block to determine one or more weight factors.

[0240] Step 3:

[0241] Decode the picture using the determined one or more weight factors.

[0242] 9. Weighting with more than one NNLF -output Block

[0243] In this embodiment there is more than one set of samples produced by NNLF 404. That is, in one embodiment, a set of first-NNLF-outputs are produced, and the blending is done using each first-NNLF-output included in the set of first-NNLF-outputs and the non-first-NNLF- output.

[0244] In one example, a weighting for the blending is done such that w deblock + w_nnl + ... + w_nnN = 1, and {w_deblock, w_nnl, ... , w_nnN}, are between 0 and 1, where w deblock is the weight factor for the non-first-NNLF-output, w nnl is the weight factor for the first set of NNLF produced samples, and w_nnN is the weight factor for the N-th set of NNLF produced samples.

[0245] In one version of this embodiment, one or more first-NNLF-outputs are selected from the set of first-NNLF-outputs and the blending is done using the first-NNLF-outputs selected from the set of first-NNLF-outputs and the non-first-NNLF-output.

[0246] Decoder 104 may perform all or a subset of the following steps for this embodiment to decode a picture:

[0247] Step l :

[0248] Obtain weighting information for at least a first and a second neural network (NN) filters to be applied for at least one NN block, wherein the weighting information comprises a first weight factor for the first NN filter and a second weight factor for the second NN filter.

[0249] Step 2:

[0250] Decode the picture by performing weighting between the first NN loop filter output and the second NN loop filter output and the non-first-NNLF-output using the first weight factor and the second weight factor.

[0251] 10. Conditional Weighting

[0252] In this embodiment one or more syntax elements, e.g., one or more flags and / or index values, are signalled for controlling whether or not to do the weighting (blending) between the filtering outputs in any of the previous embodiments. The signalling may be done for a certain specified part of the bitstream, e.g., a CU, a CTU, a slice, a subpicture, a picture, a group of pictures, or the whole video sequence.

[0253] The one or more syntax elements for controlling whether to do the weighting for samples in a certain specified part of the bitstream may be signalled in a parameter set such as VPS, SPS, PPS, APS, or in a header such as a picture header or a slice header or per CU or CTU in the slice data. In one example, a syntax element, e.g., nn weight idc, may be signalled thatcontrols how a first-NNLF-output is to be blended with a non-first-NNLF-output for a certain specified part of the bitstream, e.g., a CU, a CTU, a slice, a subpicture, a picture, a group of pictures, or the whole video sequence.

[0254] In one embodiment, a nn weight idc value of 0 means that the weighting is not done and the non-first-NNLF-output samples are used for the specified part of the bitstream.

[0255] In one embodiment, a nn weight idc value of 1 means that the weighting is not done and the first-NNLF-output samples are used for the specified part of the bitstream.

[0256] In one embodiment, a nn weight idc value of 2 means that the weighting is done with a first specific type of blending, e.g., a boundary-dependent blending such as described above.

[0257] In one embodiment, a nn weight idc value of 3 means that the weighting is done with a second specific type of blending, e.g., a non-boundary-dependent blending.

[0258] In one embodiment, decoder 104 may perform all or a subset of the following steps for this embodiment to decode a picture:

[0259] Step l :

[0260] Obtain weighting information for the NNLF to be applied for at least one NN block, wherein the weighting information comprises an indicator indicating whether or not to apply a weighting between the first-NNLF-output and the non-first-NNLF-output.

[0261] Step 2:

[0262] Decode the picture using the weighting between the first-NNLF-output and the non-first-NNLF-output if the indicator value is equal to a first value.

[0263] Step 3:

[0264] Decode the picture without using the weighting between the first-NNLF-output and the non-first-NNLF-output if the indicator value is equal to a second value.

[0265] In another embodiment, decoder 104 may perform all or a subset of the following steps for this embodiment to decode a picture:

[0266] Step l :

[0267] Obtain weighting information for the NNLF to be applied for at least one NN block, wherein the weighting information comprises an indicator indicating a type of weighting to apply between the first-NNLF-output and the non-first-NNLF-output.

[0268] Step 2:

[0269] Decode the picture using a first type of weighting between the first-NNLF-output and the non-first-NNLF-output if the indicator value is equal to a first value.

[0270] Step 3:

[0271] Decode the picture using a second type of weighting between the first-NNLF- output and the non-first-NNLF-output if the indicator value is equal to a second value.

[0272] FIG. 10 is a flowchart illustrating a process 1000 for decoding a picture from a video bitstream. Process 1000 may begin in step sl002. Step sl002 comprises (i.e., includes at least) obtaining weighting information including at least a first weighting information component and a second weighting information component specifying one or more weight factors (i.e., at least a first weight factor) specifying a weighting between first-NNLF-output and non-first- NNLF-output. For example, the first weighting information component is or comprises the first weight factor. As another example, the first weight factor may be derived: i) using only the first weighting information component, ii) using only the second weighting information component, iii) using only the first weighting information component and the second weighting information component, or iv) using the first weighting information component, the second weighting information component, and other information. The one or more weight factors include at least a first weight factor, and the first-NNLF-output is produced using a first neural network (NN) loop filter (NNLF) based on an NN block of an input block of the picture. Step si 004 comprises decoding the picture using the first and the second weighting information components.

[0273] FIG. 11 is a flowchart illustrating a process 1100 for producing a video bitstream. Process 1100 may begin in step si 102. Step si 102 comprises obtaining weighting information including at least a first weighting information component and a second weighting information component specifying one or more weight factors specifying a weighting between first-NNLF- output and non-first-NNLF-output. The one or more weight factors include at least a first weight factor and the first-NNLF-output is output produced using a first neural network (NN) loop filter(NNLF) based on an NN block of an input block of a picture. Step si 104 comprises including the weighting information in the bitstream.

[0274] FIG. 12 is a block diagram of an apparatus 1200 for implementing encoder 102, device 103, and / or decoder 104, according to some embodiments. When apparatus 1200 implements encoder 102, apparatus 1200 may be referred to as an encoder apparatus, and when apparatus 1200 implements decoder 104, apparatus 1200 may be referred to as a decoder apparatus. As shown in FIG. 12, apparatus 1200 may comprise: processing circuitry (PC) 1202, which comprises one or more processors (P) 1255 (e.g., one or more general purpose microprocessors and / or one or more other processors, such as an application specific integrated circuit (ASIC), field-programmable gate arrays (FPGAs), and the like), which processors may be co-located in a single housing or in a single data center or may be geographically distributed (e.g., apparatus 1200 may be a distributed, cloud computing system comprising two or more computers or a monolithic computing system consisting of a single computer); at least one network interface 1248 (e.g., a physical interface or air interface) comprising a transmitter (Tx) 1245 and a receiver (Rx) 1247 for enabling apparatus 1200 to transmit data to and receive data from other nodes connected to a network 120 (e.g., an Internet Protocol (IP) network) to which network interface 1248 is connected (physically or wirelessly) (e.g., network interface 1248 may be coupled to an antenna arrangement comprising one or more antennas for enabling encoder apparatus 1200 to wirelessly transmit / receive data); and a storage unit (a.k.a., “data storage system”) 1208, which may include one or more non-volatile storage devices and / or one or more volatile storage devices. In embodiments where PC 1202 includes a programmable processor, a computer readable storage medium (CRSM) 1242 may be provided. CRSM 1242 may store a computer program (CP) 1243 comprising computer readable instructions (CRI) 1244. CRSM 1242 may be a non-transitory computer readable medium, such as magnetic media (e.g., a hard disk), optical media, memory devices (e.g., random access memory, flash memory), and the like. In some embodiments, the CRI 1244 of computer program 1243 is configured such that when executed by PC 1202, the CRI causes encoder apparatus 1200 to perform steps described herein (e.g., steps described herein with reference to the flow charts). In other embodiments, encoder apparatus 1200 may be configured to perform steps described herein without the need for code. That is, for example, PC 1202 may consist merely of one ormore ASICs. Hence, the features of the embodiments described herein may be implemented in hardware and / or software.

[0275] Summary of Various EmbodimentsAl. A method (1000) for decoding a picture from a bitstream, the method comprising: obtaining weighting information including at least a first weighting information component and a second weighting information component specifying one or more weight factors specifying a weighting between first-NNLF-output and non-first-NNLF-output, wherein the one or more weight factors include at least a first weight factor, and the first-NNLF-output is produced using a first neural network, NN, loop filter, NNLF, based on an NN block of an input block of the picture; and decoding the picture using the first and the second weighting information components.Bl . A method (1100) for producing a bitstream, the method comprising: obtaining weighting information including at least a first weighting information component and a second weighting information component specifying one or more weight factors specifying a weighting between first-NNLF-output and non-first-NNLF-output, wherein the one or more weight factors include at least a first weight factor, and the first-NNLF-output is output produced using a first neural network, NN, loop filter, NNLF, based on an NN block of an input block of a picture; and including the weighting information in the bitstream.A2. The method of embodiment Al or Bl, wherein the first weighting information component comprises: the first weight factor, a weight factor offset for use in deriving the first weight factor, a relative weight factor, information specifying a weight factor granularity, a list of candidate weight factors, the candidate weight factors including at least the first weight factor, an index pointing to the first weight factor, wherein the first weight factor is included in a list of candidate weight factors,information specifying values a weight factor can have, information specifying a function defining the transition of a weight factor over an area of sample, information specifying a function specifying the blending of the NN filtered output and the second filter output, information specifying a rule or constraint that specifies if and / or how to apply a weight factor, information specifying if the first weight factor is present in the bitstream, an indicator indicating whether to do a weighting of the filtering or not, and / or information specifying a boundary condition such as the distance of the NN block to a boundary being smaller than a threshold value.A3. The method of embodiment Al or A2, wherein the weighting information further specifies a second weight factor, wherein the first and second weight factors are weight factors for a channel, decoding the picture using the first and the second weighting information components comprises decoding the picture using the first and the second weight factors, and decoding the picture using the first and the second weight factors comprises: applying the first weight factor to a first value in the first-NNLF-output, and applying the second weight factor to a second value in the first-NNLF-output.A4. The method of any previous embodiment, wherein the non-first-NNLF-output includes at least one of: a deblocking filtered output, an ALF filtered output, a SAO filtered output, a bilateral filtered output, a Hadamard filtered output, an AVI filter type filtered output, an AV2 filter type filtered output, other non-NN in-loop filter filtered output,a second NN filter output, or an unfiltered output.A5. The method of any previous embodiment, wherein at least some of the weighting information is decoded from one or more syntax elements in the bitstream.A6. The method of any previous embodiment, wherein the weighting information is decoded from a parameter set (e.g., a VPS, an SPS, a PPS, an APS) or a header (e.g., a picture header or a slice header).A7. The method of any previous embodiment, wherein the weighting information applies to one or more of the following: a group of pictures, all portions of the picture, one or more portions of the picture (e.g., one or more slices, one or more subpictures, one or more tiles, one or more regions-of-interest), a specific portion of the picture (e.g., an NN block, a CTU or a block of the picture with specified size), or a specific sub-block of the picture (e.g., a CU in a CTU).A8. The method of embodiment A7, wherein at least one of the following applies: at least two blocks have different weight factors, and at least two sub-blocks have different weight factors.A9. The method of any previous embodiment, wherein the first weighting information component comprises an index pointing to a specific weight factor in a list of candidate weight factors, and the first weight factor is the specific weight factor.A10. The method of any previous embodiment, wherein the first weight factor is derived based on one or more of:a sample value (e.g., a sample value in the CTU, the CU, the NN block or sub-block), a weight factor for a neighbouring NN block or sub-block, a weight factor of a collocated NN block or sub-block, or an offset value and a second weight factor.Al l. The method of any one of embodiments Al - Al 0, wherein the first weighting information component comprises a global weight factor, and the method further comprises: determining whether the global weight factor is to be used for at least a portion of the NN block, and in response to determining that the global weight factor is not to be used for at least a portion of the NN block, decoding the second weighting information component, wherein the second weighting information specifies the first weight factor.Al 2. The method of any previous embodiment, wherein the first weighting information component comprises a global weight factor, the second weighting information component comprises a local weight factor, and the method further comprises deriving the first weight factor based on the global weight factor and the local weight factor.Al 3. The method of embodiment Al 1 or A12, wherein the global weight factor is signalled for one or more of: a group of pictures comprising the picture, the picture, a portion of the picture (e.g., a slice, a subpicture, a tile, a CTU, a CU, an NN block).Al 4. The method of any previous embodiment, wherein the first weight factor is applied to a first sub-block of the NN block, the one or more weight factors further includes at least a second weight factor, and the second weight factor is applied to a second sub-block of the NN block.Al 5. The method of embodiment A14, wherein the first sub-block or the second subblock is aligned with a block in a CU split level.Al 6. The method of any previous embodiment, wherein the NN block is aligned with a CTU.Al 7. The method of any previous embodiment, wherein the first weight factor is applied to a first channel, cl, of the NN block, the one or more weight factors further includes at least a second weight factor, the second weight factor is applied to a second channel, c2, of the NN block, and channel cl and channel c2 are not the same.Al 8. The method of any previous embodiment, wherein the weight information further comprises information specifying the number of weighting factors specified by the weighting information.Al 9. The method of any previous embodiment, wherein the first weighting information component comprises weight factor granularity information.A20. The method of embodiment Al 9, wherein the weight factor granularity information defines at least a first level to which the first weight factor is applied, wherein the first level is one of: a temporal layer, a group of pictures, a picture, a subpicture, a slice, a tile, a CTU, a specific CU split level, a CU,a NN block, or a sub-block.A21. The method of any previous embodiment, wherein the first weighting information component comprises candidate weight factor information specifying a set of candidate weight factors.A22. The method of embodiment A21, wherein the candidate weight factor information comprises: information specifying the number of candidate weight factors included in the set of candidate weight factors, or information specifying a number of weight factor steps (e.g., with number of steps N=4, the possible values for the weight factor may be 0, 0.25, 0.5, 0.75 and 1).A23. The method of embodiment A21, wherein the candidate weight factor information comprises: information specifying a distribution of the candidate weight factors (e.g., if the possible weight factors are equally distant from each other or there is a logarithmic distribution), or information specifying a distance between at least two consecutive candidate weight factors (e.g., the possible weight factors are 0.25 far apart).A24. The method of any previous embodiment, wherein the first weight factor is determined based on at least one boundary condition related to a boundary.A25. The method of embodiment A24, wherein the boundary is at least one of: a picture boundary, a CTU boundary, a CU boundary, a block boundary,a virtual boundary (e.g., a virtual boundary within a CTU or within a known distance from a CTU, a virtual boundary within an NN block or within a known distance from the NN block), a boundary of the input NN block, a boundary of a tile, slice or subpicture or other picture partitions, a gradual decoder refresh boundary (e.g., a boundary between a refreshed and nonrefreshed area), or a boundary of a region-of-interest (ROI).A26. The method of any of embodiments A24 and A25, wherein the boundary condition comprises at least one of: a distance D being smaller than a threshold T, a distance D being larger than a threshold T, or a distance D being equal to a threshold T.A27. The method of any previous embodiment, wherein there is a transition of weight factors over an area of samples.A28. The method of any previous embodiment, wherein at least a first rule is used to determine whether and / or how to apply the first weight factor, and the first rule is based on at least one of: a location of the NN block (e.g., for NN blocks located at the border of the picture only use NN filtered output), a type of coding scheme (e.g., turn off for low delay), a presence of a boundary (e.g., apply a first set of weight factors if there is a virtual boundary inside the NN block), a slice type (e.g., apply a first set of weight factors if the input patch is from an intra coded CTU and apply a second set of weight factors if the input patch is from an inter coded CTU), ora delta value (e.g., subtract the delta value from or add the delta value to the weight factor if a certain condition exists).A29. The method of any previous embodiment, wherein the weighting information further comprises one or more syntax elements describing a function for blending of the first- NNLF-output and the non-first-NNLF -output.A30. The method of embodiment A29, wherein the function is one or more of: a function of the position of the values in the NN block, a linear function, a piecewise linear function, or a nonlinear function.A31. The method of any previous embodiment, wherein the size of the NN block determines the weighting between the first-NNLF- output and the non-first-NNLF-outputA32. The method of any previous embodiment, wherein the weighting information comprises a component specifying an NN block size threshold, and the NN block size threshold is based on: a tier and / or level value, a picture size (resolution), or a number of CUs or NN blocks or patches in the picture.A33. The method of any previous embodiment, wherein the first weight factor is determined based on an NN block size or an NN block size threshold.A34. The method of any previous embodiment, wherein a second-NNLF-output is produced using i) the first NNLF based on the NN block or ii) a second NNLF based on the NN block, andthe weighting information further specifies one or more weight factors specifying a weighting between the first-NNLF-output and the second-NNLF -output.A35. The method of any previous embodiment, wherein a second-NNLF-output is produced using the first NNLF based on the NN block, and a third-NNLF-output is produced using i) the first NNLF based on the NN block or ii) a second NNLF based on the NN block, and the method further comprises: producing the first-NNLF-output by blending the second-NNLF-output with the third- NNLF-output; and producing blended output by blending the first -NNLF-output with the non-first-NNLF- output using the first weight factor.A36. The method of any previous embodiment, wherein the weighting information further comprises an indicator indicating whether or not to do a weighting between the first- NNLF-output and the non-first-NNLF -output.A37. The method of embodiment A36, wherein the indicator has a scope, and the scope of the indicator is a NN block, a CU, a CTU, a tile, a slice, a subpicture, a picture, a group of pictures, or a sequence.Cl. A computer program (1243) comprising instructions (1244) which when executed by processing circuitry (1202) of an apparatus (1200) causes the apparatus to perform the method of any one of the above embodiments.C2. A carrier containing the computer program of embodiment Cl, wherein the carrier is one of an electronic signal, an optical signal, a radio signal, and a computer readable storage medium (1242).DI. A decoding apparatus (1200) for decoding a picture from a bitstream, the decoding apparatus being configured to: obtain weighting information including at least a first weighting information component and a second weighting information component specifying one or more weight factors specifying a weighting between first-NNLF -output and non-first-NNLF -output, wherein the one or more weight factors include at least a first weight factor, and the first-NNLF-output is produced using a first neural network, NN, loop filter, NNLF, based on an NN block of an input block of the picture; and decode the picture using the first and the second weighting information components.D2. The decoding apparatus of embodiment DI, wherein the decoding apparatus is further configured to perform the method of any one of embodiments A2-A37.El. An encoding apparatus (1200) for producing a bitstream, the encoding apparatus being configured to: obtain weighting information including at least a first weighting information component and a second weighting information component specifying one or more weight factors specifying a weighting between first-NNLF-output and non-first-NNLF -output, wherein the one or more weight factors include at least a first weight factor, and the first-NNLF-output is output produced using a first neural network, NN, loop filter, NNLF, based on an NN block of an input block of a picture; and include the weighting information in the bitstream.E2. The encoding apparatus of embodiment El, wherein the encoding apparatus is further configured to perform the method of any one of embodiments A2, A4-A10, or A12-A37.

[0276] While the terminology in this disclosure is described in terms of WC, the embodiments of this disclosure also apply to any existing or future codec, which may use a different, but equivalent terminology.

[0277] While various embodiments are described herein, it should be understood that they have been presented by way of example only, and not limitation. Thus, the breadth and scope of this disclosure should not be limited by any of the above-described exemplaryembodiments. Moreover, any combination of the above-described elements in all possible variations thereof is encompassed by the disclosure unless otherwise indicated herein or otherwise clearly contradicted by context.

[0278] Additionally, while the processes described above and illustrated in the drawings are shown as a sequence of steps, this was done solely for the sake of illustration. Accordingly, it is contemplated that some steps may be added, some steps may be omitted, the order of the steps may be re-arranged, and some steps may be performed in parallel.

Claims

1. CLAIMS1. A method (1000) for decoding a picture (600) from a video bitstream, the method comprising: obtaining weighting (si 002) information including at least a first weighting information component and a second weighting information component specifying one or more weight factors specifying a weighting between first-neural network loop filter, NNLF, -output and non- first-NNLF-output, wherein the one or more weight factors include at least a first weight factor, and the first-NNLF-output is produced using a first NNLF (404), based on a neural network, NN, block (504; 700) of an input block (502) of the picture (600); and decoding (si 004) the picture (600) using the first and the second weighting information components.

2. The method (1000) of claim 1, wherein the first weighting information component comprises at least one of: the first weight factor, a weight factor offset for use in deriving the first weight factor, a relative weight factor, information specifying a weight factor granularity, a list of candidate weight factors, the candidate weight factors including at least the first weight factor, an index pointing to the first weight factor, wherein the first weight factor is included in a list of candidate weight factors, information specifying values a weight factor can have, information specifying a function defining the transition of a weight factor over an area of sample values, information specifying a function specifying the blending of the first-NNLF-output and the non-first-NNLF -output,information specifying a rule or constraint that specifies if and / or how to apply a weight factor, information specifying if the first weight factor is present in the bitstream, an indicator indicating whether to do a weighting of the filtering or not, and information specifying a boundary condition such as the distance (D) of the NN block (700) to a boundary being smaller than a threshold value (T).

3. The method (1000) of claim 1 or 2, wherein the weighting information further specifies a second weight factor, wherein the first and second weight factors are weight factors for a channel, decoding (si 004) the picture (600) using the first and the second weighting information components comprises decoding the picture using the first and the second weight factors, and decoding (si 004) the picture (600) using the first and the second weight factors comprises: applying the first weight factor to a first value in the first-NNLF-output, and applying the second weight factor to a second value in the first-NNLF-output.

4. The method (1000) of any of claims 1 to 3, wherein the non-first-NNLF-output includes at least one of: a deblocking filtered output, an adaptive loop filter, ALF, filtered output, a sample adaptive offset, SAO, filtered output, a bilateral filtered output, a Hadamard filtered output, an AVI filter type filtered output, an AV2 filter type filtered output, other non-NN in-loop filter filtered output, a second NN filter output, or an unfiltered output.

5. The method (1000) of any of claims 1 to 4, wherein at least some of the weighting information is decoded from one or more syntax elements in the bitstream.

6. The method (1000) of any of claims 1 to 5, wherein the weighting information is decoded from a parameter set or a header.

7. The method (1000) of any of claims 1 to 6, wherein the weighting information applies to one or more of the following: a group of pictures, all portions of the picture, one or more portions of the picture, a specific portion of the picture (600), or a specific sub-block of the picture (600).

8. The method (1000) of claim 7, wherein at least one of the following applies: at least two blocks have different weight factors, and at least two sub-blocks have different weight factors.

9. The method (1000) of any of claims 1 to 8, wherein the first weighting information component comprises an index pointing to a specific weight factor in a list of candidate weight factors, and the first weight factor is the specific weight factor.

10. The method (1000) of any of claims 1 to 9, wherein the first weight factor is derived based on one or more of: a sample value, a weight factor for a neighbouring NN block or sub-block, a weight factor of a collocated NN block or sub-block, or an offset value and a second weight factor.

11. The method (1000) of any one of claims Ito 10, whereinthe first weighting information component comprises a global weight factor, and the method further comprises: determining whether the global weight factor is to be used for at least a portion of the NN block (504; 700), and in response to determining that the global weight factor is not to be used for at least a portion of the NN block (504; 700), decoding the second weighting information component, wherein the second weighting information specifies the first weight factor.

12. The method (1000) of any of claims 1 to 11, wherein the first weighting information component comprises a global weight factor, the second weighting information component comprises a local weight factor, and the method further comprises deriving the first weight factor based on the global weight factor and the local weight factor.

13. The method (1000) of claim 11 or 12, wherein the global weight factor is signalled for one or more of: a group of pictures comprising the picture, the picture (600), a portion of the picture (600).

14. The method (1000) of any of claims 1 to 13, wherein the first weight factor is applied to a first sub-block of the NN block (504; 700), the one or more weight factors further include at least a second weight factor, and the second weight factor is applied to a second sub-block of the NN block (504; 700).

15. The method (1000) of claim 14, wherein the first sub-block or the second sub-block is aligned with a block at a coding unit, CU, split level.

16. The method (1000) of any claims 1 to 15, wherein the NN block (504; 700) is aligned with a coding tree unit, CTU.

17. The method (1000) of any of claims 1 to 16, wherein the first weight factor is applied to a first channel, cl, of the NN block (504; 700), the one or more weight factors further include at least a second weight factor, the second weight factor is applied to a second channel, c2, of the NN block (504; 700), and channel cl and channel c2 are not the same.

18. The method (1000) of any of claims 1 to 17, wherein the weighting information further comprises information specifying the number of weight factors specified by the weighting information.

19. The method (1000) of any of claims 1 to 18, wherein the first weighting information component comprises weight factor granularity information.

20. The method (1000) of claim 19, wherein the weight factor granularity information defines at least a first level to which the first weight factor is applied, wherein the first level is one of: a temporal layer, a group of pictures, a picture, a subpicture, a slice, a tile, a CTU, a specific CU split level, a CU, an NN block, or a sub-block.

21. The method (1000) of any of claims 1 to 20, wherein the first weighting information component comprises candidate weight factor information specifying a set of candidate weight factors.

22. The method (1000) of claim 21, wherein the candidate weight factor information comprises: information specifying the number of candidate weight factors included in the set of candidate weight factors, or information specifying a number of weight factor steps.

23. The method (1000) of claim 21, wherein the candidate weight factor information comprises: information specifying a distribution of the candidate weight factors, or information specifying a distance between at least two consecutive candidate weight factors.

24. The method (1000) of any of claims 1 to 23, wherein the first weight factor is determined based on at least one boundary condition related to a boundary.

25. The method (1000) of claim 24, wherein the boundary is at least one of: a picture boundary, a CTU boundary, a CU boundary, a block boundary, a virtual boundary, a boundary of the input NN block, a boundary of a tile, slice or subpicture or other picture partitions, a gradual decoder refresh boundary, or a boundary of a region-of-interest, ROI.

26. The method (1000) of any claim 24 or 25, wherein the boundary condition comprises at least one of: a distance D being smaller than a threshold T, a distance D being larger than a threshold T, or a distance D being equal to a threshold T.

27. The method (1000) of any of claims 1 to 26, wherein there is a transition of weight factors over an area (700) of sample values.

28. The method (1000) of any of claims 1 to 27, wherein at least a first rule is used to determine whether and / or how to apply the first weight factor, and the first rule is based on at least one of: a location of the NN block (504; 700), a type of coding scheme, a presence of a boundary, a slice type, or a delta value.

29. The method (1000) of any of claims 1 to 28, wherein the weighting information further comprises one or more syntax elements describing a function for blending of the first- NNLF-output and the non-first-NNLF -output.

30. The method (1000) of claim 29, wherein the function is one or more of: a function of the position of the values in the NN block (504; 700), a linear function, a piecewise linear function, or a nonlinear function.

31. The method (1000) of any of claims 1 to 30, wherein the size of the NN block (504;700) determines the weighting between the first-NNLF-output and the non-first-NNLF -output32. The method (1000) of any of claims 1 to 31, wherein the weighting information comprises a component specifying an NN block size threshold, and the NN block size threshold is based on: a tier and / or level value, a picture size, or a number of CUs or NN blocks or patches in the picture.

33. The method (1000) of any of claims 1 to 32, wherein the first weight factor is determined based on an NN block size or an NN block size threshold.

34. The method (1000) of any of claims 1 to 33, wherein a second-NNLF-output is produced using i) the first NNLF (404) based on the NN block (504; 700) or ii) a second NNLF based on the NN block (504; 700), and the weighting information further specifies one or more weight factors specifying a weighting between the first-NNLF-output and the second-NNLF-output.

35. The method (1000) of any of claims 1 to 34, wherein a second-NNLF-output is produced using the first NNLF (404) based on the NN block (504; 700), and a third-NNLF-output is produced using i) the first NNLF (404) based on the NN block (504; 700) or ii) a second NNLF based on the NN block (504; 700), and the method further comprises: producing the first-NNLF-output by blending the second-NNLF-output with the third- NNLF-output; and producing blended output by blending the first -NNLF-output with the non-first-NNLF- output using the first weight factor.

36. The method (1000) of any of claims 1 to 35, wherein the weighting information further comprises an indicator indicating whether or not to do a weighting between the first- NNLF-output and the non-first-NNLF -output.

37. The method (1000) of claim 36, wherein the indicator has a scope, and the scope of the indicator is an NN block, a CU, a CTU, a tile, a slice, a subpicture, a picture, a group of pictures, or a video sequence.

38. A method (1100) for producing a video bitstream, the method comprising: obtaining (si 102) weighting information including at least a first weighting information component and a second weighting information component specifying one or more weight factors specifying a weighting between first-neural network loop filter, NNLF, -output and non- first-NNLF-output, wherein the one or more weight factors include at least a first weight factor, and the first-NNLF-output is output produced using a first NNLF (404), based on a neural network, NN, block (504; 700) of an input block (502) of a picture (600); and including (si 104) the weighting information in the video bitstream.

39. The method (1100) of claim 38, further comprising any one of claims 2 to 37.

40. A computer program (1243) comprising instructions (1244) which when executed by processing circuitry (1202) of an apparatus (1200) causes the apparatus to perform the method of any one of claims 1 to 39.

41. A carrier containing the computer program of claim 40, wherein the carrier is one of an electronic signal, an optical signal, a radio signal, and a computer readable storage medium (1242).

42. A decoding apparatus (104; 1200) for decoding a picture (600) from a video bitstream, the decoding apparatus being configured to: obtain weighting information including at least a first weighting information component and a second weighting information component specifying one or more weight factors specifying a weighting between first-neural network loop filter, NNLF, -output and non-first-NNLF-output, wherein the one or more weight factors include at least a first weight factor, and the first-NNLF-output is produced using a first NNLF (404), based on an NN block (504; 700) of an input block (502) of the picture (600); and decode the picture (600) using the first and the second weighting information components.

43. The decoding apparatus (104; 1200) of claim 42, wherein the decoding apparatus is further configured to perform the method of any one of claims 2 to37.

44. An encoding apparatus (102; 1200) for producing a video bitstream, the encoding apparatus being configured to: obtain weighting information including at least a first weighting information component and a second weighting information component specifying one or more weight factors specifying a weighting between first-neural network loop filter, NNLF, -output and non-first-NNLF-output, wherein the one or more weight factors include at least a first weight factor, and the first-NNLF-output is output produced using a first NNLF (404), based on an NN block (504; 700) of an input block (502) of a picture (600); and include the weighting information in the bitstream.

45. The encoding apparatus (102; 1200) of claim 44, wherein the encoding apparatus is further configured to perform the method of any one of claims 2, 4 to 10, and 12 to 37.