A method for encoding / decoding video, a method for transmitting a bitstream, and a recording medium for storing a bitstream.

The video encoding/decoding method addresses the high-cost challenge of high-resolution video by using NNPF SEI messages to enhance encoding/decoding efficiency and reduce decoder errors, improving coding quality and clarity.

JP2026521919APending Publication Date: 2026-07-02LG ELECTRONICS INC

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
LG ELECTRONICS INC
Filing Date
2024-06-27
Publication Date
2026-07-02

AI Technical Summary

Technical Problem

The increasing demand for high-resolution, high-quality video has led to higher transmission and storage costs due to the increased amount of information, necessitating more efficient video compression technologies.

Method used

A video encoding/decoding method that includes obtaining and signaling output picture information through neural-network post-filter (NNPF) supplemental enhancement information (SEI) messages to improve encoding/decoding efficiency and clarify the meaning of output pictures, reducing decoder errors and enhancing coding quality.

Benefits of technology

The method improves encoding/decoding efficiency by clarifying output picture information, reducing decoder errors, and enhancing coding quality, while allowing for clearer communication of NNPF-related SEI messages.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026521919000001_ABST
    Figure 2026521919000001_ABST
Patent Text Reader

Abstract

A video encoding / decoding method, a bitstream transmission method, and a computer-readable recording medium for storing a bitstream are provided. The video decoding method according to this disclosure includes the steps of: obtaining post-filter-based output picture information for an input picture from an NNPF (neural-network post-filter) related SEI (supplemental enhancement information) message; and obtaining an output picture for the input picture based on the output picture information, wherein the output picture information may include output picture output information, which is information indicating whether or not an output picture for the input picture is output.
Need to check novelty before this filing date? Find Prior Art

Description

[Technical Field]

[0001] This disclosure relates to a video encoding / decoding method, a bitstream transmission method, and a recording medium storing a bitstream, and more particularly to a video encoding / decoding method, a bitstream transmission method, and a recording medium storing a bitstream related to a neural network post-filter. [Background technology]

[0002] In recent years, demand for high-resolution, high-quality video, such as HD (High Definition) and UHD (Ultra High Definition) video, has been increasing in various fields. The higher the resolution and quality of video data, the greater the amount of information or bits transmitted compared to existing video data. This increase in the amount of information or bits transmitted leads to increased transmission and storage costs.

[0003] Therefore, highly efficient video compression technology is desired to effectively transmit, store, and play back high-resolution, high-quality video information. [Overview of the Initiative] [Problems that the invention aims to solve]

[0004] The purpose of this disclosure is to provide a video encoding / decoding method and apparatus with improved encoding / decoding efficiency.

[0005] Furthermore, this disclosure aims to provide a method for processing NNPF-related SEI messages (NNPFC SEI and NNPFA SEI).

[0006] Furthermore, this disclosure aims to more clearly identify NNPF output pictures using NNPF-related SEI messages.

[0007] Furthermore, this disclosure aims to clarify the meaning of information related to NNPF output pictures.

[0008] Furthermore, this disclosure aims to reduce decoder errors by clarifying the meaning of information related to the output picture of the NNPF.

[0009] Furthermore, this disclosure aims to improve coding quality and efficiency by clarifying the meaning of information related to output pictures in NNPF.

[0010] Furthermore, this disclosure aims to improve coding efficiency by determining whether or not an output picture is output when no input picture exists.

[0011] Furthermore, this disclosure aims to provide a non-temporary computer-readable recording medium for storing a bitstream generated by the video encoding method relating to this disclosure.

[0012] Furthermore, this disclosure aims to provide a non-temporary computer-readable recording medium that stores a bitstream received and decoded by the video decoding device relating to this disclosure and used for video restoration.

[0013] Furthermore, this disclosure aims to provide a method for transmitting a bitstream generated by the video encoding method relating to this disclosure.

[0014] The technical challenges addressed in this disclosure are not limited to those mentioned above, and other technical challenges not mentioned above will be clearly understood by those with ordinary skill in the art to which this disclosure pertains from the following description. [Means for solving the problem]

[0015] The video decoding method performed by a video decoding apparatus according to an embodiment of the present disclosure includes obtaining post-filter-based output picture information for an input picture from an NNPF (neural-network post-filter) related SEI (supplemental enhancement information) message, and obtaining an output picture for the input picture based on the output picture information, and the output picture information may include output picture output information that is information indicating whether an output picture for the input picture is output.

[0016] On the other hand, as an embodiment, the value of the output picture output information may be determined based on whether the input picture exists.

[0017] On the other hand, as an embodiment, the value of the output picture output information may be further determined based on output picture existence information that is information indicating whether the output picture for the input picture exists.

[0018] On the other hand, as an embodiment, when the input picture does not exist and the output picture existence information indicates that the output picture exists, the output picture output information may indicate that the output picture is not output.

[0019] On the other hand, as an embodiment, the number of the output picture output information may be determined to be a value within a specific range.

[0020] On the other hand, as an embodiment, the specific range may be determined based on an index value of a specific input picture.

[0021] On the other hand, as an embodiment, the number of the output picture output information may always be determined to be a value greater than a specific value.

[0022] On the other hand, in one embodiment, the output of the output picture for the non-existent input picture may be excluded in the filtering process based on NNPF.

[0023] On the other hand, in one embodiment, the output picture existence information may be obtained from an NNPFC SEI message.

[0024] On the other hand, in one embodiment, the output picture output information may be obtained from the NNPFA SEI message.

[0025] A video encoding method performed by a video encoding device according to one embodiment of the present disclosure includes the steps of determining post-filter-based output picture information for an input picture, and signaling the output picture information with an NNPF (neural-network post-filter) related SEI (supplemental enhancement information) message, wherein the output picture information may include output picture output information which indicates whether or not an output picture is output for the input picture.

[0026] Furthermore, this disclosure may provide a non-temporary computer-readable recording medium for storing a bitstream generated by the video encoding method relating to this disclosure.

[0027] Furthermore, this disclosure provides a non-temporary computer-readable recording medium for storing a bitstream that is received and decoded by the video decoding device relating to this disclosure and used for restoring video.

[0028] Furthermore, this disclosure provides a method for transmitting a bitstream generated by a video encoding method.

[0029] The features of this disclosure briefly summarized above are merely illustrative examples of the detailed description of this disclosure described below and do not limit the scope of this disclosure. [Effects of the Invention]

[0030] According to this disclosure, it is possible to provide a video encoding / decoding method and apparatus with improved encoding / decoding efficiency.

[0031] Furthermore, this disclosure will allow for the modification of the semantics of information within NNPF-related SEI messages, enabling clearer communication of meaning.

[0032] Furthermore, this disclosure allows for the correction of the semantics of information within NNPF-related SEI messages, thereby reducing decoder errors.

[0033] Furthermore, according to this disclosure, efficiency can be improved by clarifying the output of the output picture using NNPF-related SEI messages.

[0034] Furthermore, according to this disclosure, efficiency can be improved by more clearly identifying information about the NNPF output picture through NNPF-related SEI messages.

[0035] Furthermore, according to this disclosure, NNPF can improve coding quality and efficiency by clarifying the meaning of information related to the output picture.

[0036] Furthermore, this disclosure makes it possible to provide a non-temporary computer-readable recording medium for storing a bitstream generated by the video encoding method relating to this disclosure.

[0037] Furthermore, this disclosure makes it possible to provide a non-temporary computer-readable recording medium that stores a bitstream received and decoded by the video decoding device relating to this disclosure and used for video restoration.

[0038] Furthermore, this disclosure provides a method for transmitting a bitstream generated by a video encoding method.

[0039] The effects obtained from this disclosure are not limited to those mentioned above, and any other effects not mentioned above will be clearly understood by a person with ordinary skill in the art to which this disclosure pertains from the following description. [Brief explanation of the drawing]

[0040] [Figure 1] This is a schematic diagram showing a video coding system to which the embodiments of this disclosure can be applied. [Figure 2] This is a schematic diagram showing a video encoding device to which the embodiments of this disclosure can be applied. [Figure 3] This is a schematic diagram showing an image decoding device to which the embodiments of this disclosure can be applied. [Figure 4] This diagram illustrates the interleaved method for lumen channel induction. [Figure 5] This is a flowchart illustrating a video decoding method to which the embodiments of this disclosure can be applied. [Figure 6] This is a flowchart illustrating a video encoding method to which the embodiments of this disclosure can be applied. [Figure 7] This figure illustrates a content streaming system to which the embodiments of this disclosure can be applied. [Modes for carrying out the invention]

[0041] Hereafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings, so that they can be easily implemented by a person with ordinary skill in the art to which the present disclosure pertains. However, the present disclosure may be embodied in various other forms and is not limited to the embodiments described herein.

[0042] In describing embodiments of this disclosure, if a specific description of a known configuration or function is deemed to obscure the gist of this disclosure, such detailed description will be omitted. In the figures, parts unrelated to the description of this disclosure will be omitted, and similar parts will be denoted by similar reference numerals.

[0043] In this disclosure, when one component is described as being “linked,” “joined,” or “connected” to another component, this may include not only direct linkages but also indirect linkages where other components exist in between. Furthermore, when one component is described as “containing” or “having” another component, this means, unless otherwise specified, that it may contain further other components rather than excluding them.

[0044] In this disclosure, terms such as "first," "second," etc., are used solely to distinguish one component from another, and do not limit the order or importance of the components unless otherwise specified. Therefore, within the scope of this disclosure, a first component in one embodiment may be referred to as a second component in another embodiment, and similarly, a second component in one embodiment may be referred to as a first component in another embodiment.

[0045] In this disclosure, components are distinguished from each other solely to clearly describe their respective characteristics, and this does not necessarily mean that these components are separate. That is, multiple components may be integrated to constitute a single hardware or software unit, or a single component may be distributed to constitute multiple hardware or software units. Therefore, such integrated or distributed embodiments are also included in the scope of this disclosure, even without specific mention.

[0046] In this disclosure, the components described in various embodiments are not necessarily essential components, and some may be optional components. Therefore, embodiments consisting of a subset of the components described in one embodiment are also included in the scope of this disclosure. Furthermore, embodiments that further include other components in addition to the components described in various embodiments are also included in the scope of this disclosure.

[0047] This disclosure relates to the encoding and decoding of video, and unless otherwise defined herein, the terms used herein may have their ordinary meanings in the art to which this disclosure pertains.

[0048] In this disclosure, "picture" generally refers to a unit representing a single video image for a specific time period, and "slice / tile" is an encoding unit that constitutes a part of a picture. A single picture may consist of one or more slices / tiles. A slice / tile may also contain one or more CTUs (coding tree units).

[0049] In this disclosure, “pixel” or “pel” can mean the smallest unit that constitutes a picture (or video). The term “sample” may also be used as a term corresponding to pixel. A sample may generally represent a pixel or a pixel value, or it may represent only the pixel / pixel value of the luma component, or only the pixel / pixel value of the chroma component.

[0050] In this disclosure, “unit” may represent a basic unit of image processing. A unit may include at least one of a specific region of a picture and information associated with that region. A unit may, as it may be, be replaced by terms such as “sample array,” “block,” or “area.” In general, an MxN block may include a sample (or sample array) or a set (or array) of transform coefficients consisting of M columns and N rows.

[0051] In this disclosure, “current block” can mean one of the following: “current coding block,” “current coding unit,” “block to encode,” “block to decode,” or “block to process.” When prediction is performed, “current block” can mean “current prediction block” or “block to predict.” When transformation (inverse transformation) / quantization (inverse quantization) is performed, “current block” can mean “current transformation block” or “block to transform.” When filtering is performed, “current block” can mean “block to filter.”

[0052] In this disclosure, "current block" may mean a block containing both a luma component block and a chroma component block, or "the luma block of the current block," unless otherwise explicitly stated as a chroma block. The luma component block of the current block may be expressed with an explicit mention of a luma component block, such as "luma block" or "current luma block." Similarly, the chroma component block of the current block may be expressed with an explicit mention of a chroma component block, such as "chroma block" or "current chroma block."

[0053] In this disclosure, " / " and "," may be interpreted as "and / or." For example, "A / B" and "A, B" may be interpreted as "A and / or B." Also, "A / B / C" and "A, B, C" may mean "at least one of A, B and / or C."

[0054] In this disclosure, “or” may be interpreted as “and / or.” For example, “A or B” may mean 1) “A” only, 2) “B” only, or 3) “A and B.” Alternatively, in this disclosure, “or” may mean “additionally or alternatively.”

[0055] Overview of the video coding system

[0056] Figure 1 is a schematic diagram showing a video coding system to which the embodiments of this disclosure can be applied.

[0057] A video coding system according to one embodiment may include an encoding device 10 and a decoding device 20. The encoding device 10 can transmit encoded video and / or image information or data to the decoding device 20 in file or streaming form via a digital storage medium or network.

[0058] An encoding device 10 according to one embodiment may include a video source generation unit 11, an encoding unit 12, and a transmission unit 13. A decoding device 20 according to one embodiment may include a receiving unit 21, a decoding unit 22, and a rendering unit 23. The encoding unit 12 may be called a video / image encoding unit, and the decoding unit 22 may be called a video / image decoding unit. The transmission unit 13 may be included in the encoding unit 12. The receiving unit 21 may be included in the decoding unit 22. The rendering unit 23 may include a display unit, and the display unit may be composed of a separate device or external component.

[0059] The video source generation unit 11 can acquire video / images through video / image capture, synthesis, or generation processes. The video source generation unit 11 may include a video / image capture device and / or a video / image generation device. The video / image capture device may include, for example, one or more cameras, or a video / image archive containing previously captured video / images. The video / image generation device may include, for example, a computer, a tablet, and a smartphone, and can generate video / images (electronically). For example, virtual video / images may be generated by a computer, in which case the video / image capture process may be replaced by a process in which related data is generated.

[0060] The encoding unit 12 can encode the input video / image data. The encoding unit 12 can perform a series of procedures such as prediction, transformation, and quantization for compression and encoding efficiency. The encoding unit 12 can output the encoded data (encoded video / image information) in the form of a bitstream.

[0061] The transmitting unit 13 can acquire encoded video / image information or data output in bitstream form and transmit it in file or streaming form to the receiving unit 21 of the decoding device 20 or other external object via a digital storage medium or network. The digital storage medium may include various storage media such as USB, SD, CD, DVD, Blu-ray (registered trademark: same hereinafter), HDD, SSD, etc. The transmitting unit 13 may include elements for generating media files in a predetermined file format and may include elements for transmission via a broadcast / communication network. The transmitting unit 13 may be provided as a transmission device separate from the encoding unit 12, in which case the transmission device may include at least one processor that acquires encoded video / image information or data output in bitstream form and a transmitting unit that transmits it in file or streaming form. The receiving unit 21 can extract / receive the bitstream from the storage medium or network and transmit it to the decoding unit 22.

[0062] The decoding unit 22 can decode the video / image by performing a series of procedures such as inverse quantization, inverse transform, and prediction, which correspond to the operation of the encoding unit 12.

[0063] The rendering unit 23 can render the decoded video / image. The rendered video / image may be displayed through the display unit.

[0064] Overview of video encoding equipment

[0065] Figure 2 is a schematic diagram showing a video encoding device to which the embodiments of this disclosure can be applied.

[0066] As shown in Figure 2, the video encoding device 100 may include a video splitting unit 110, a subtraction unit 115, a conversion unit 120, a quantization unit 130, an inverse quantization unit 140, an inverse conversion unit 150, an addition unit 155, a filtering unit 160, a memory 170, an inter-prediction unit 180, an intra-prediction unit 185, and an entropy encoding unit 190. The inter-prediction unit 180 and the intra-prediction unit 185 may be collectively called the "prediction unit". The conversion unit 120, the quantization unit 130, the inverse quantization unit 140, and the inverse conversion unit 150 may be included in the residual processing unit. The residual processing unit may further include a subtraction unit 115.

[0067] Depending on the embodiment, all or at least some of the multiple components constituting the video encoding device 100 may be embodied as a single hardware component (e.g., an encoder or a processor). Furthermore, the memory 170 may include a DPB (decoded picture buffer) and may be embodied by a digital storage medium.

[0068] The video splitting unit 110 can split the input video (or picture, frame) input to the video encoding device 100 into one or more processing units. For example, the processing units may be called coding units (CUs). Coding units can be obtained by recursively splitting a coding tree unit (CTU) or the largest coding unit (LCU) using a QT / BT / TT (Quad-tree / binary-tree / ternary-tree) structure. For example, one coding unit may be split into multiple coding units of deeper depth based on a quad-tree structure, a binary-tree structure, and / or a ternary-tree structure. For the splitting of coding units, a quad-tree structure may be applied first, followed by a binary-tree structure and / or a ternary-tree structure. Based on the final coding unit that is not further split, the coding procedure according to this disclosure may be performed. The maximum coding unit may be used directly as the final coding unit, or a lower-depth coding unit obtained by dividing the maximum coding unit may be used as the final coding unit. Here, the coding procedure may include procedures such as prediction, transformation, and / or restoration, which will be described later. As another example, the processing unit of the coding procedure may be a prediction unit (PU) or a transformation unit (TU). The prediction unit and the transformation unit may be divided or partitioned from the final coding unit, respectively. The prediction unit may be a unit of sample prediction, and the transformation unit may be a unit that derives transformation coefficients and / or a unit that derives a residual signal from transformation coefficients.

[0069] The prediction unit (inter-prediction unit 180 or intra-prediction unit 185) can make predictions for the block to be processed (current block) and generate a predicted block that includes prediction samples for the current block. The prediction unit can determine whether intra-prediction or inter-prediction is applied to the current block or on a CU basis. The prediction unit can generate various information regarding the prediction of the current block and transmit it to the entropy encoding unit 190. The prediction information may be encoded by the entropy encoding unit 190 and output in the form of a bitstream.

[0070] The intra-prediction unit 185 can predict the current block by referring to a sample in the current picture. The referenced sample may be located in the vicinity (neighbor) or away from the current block, depending on the intra-prediction mode and / or intra-prediction method. The intra-prediction mode may include a plurality of non-directional modes and a plurality of directional modes. The non-directional modes may include, for example, a DC mode and a planar mode. The directional modes may include, for example, 33 directional prediction modes or 65 directional prediction modes, depending on the accuracy of the prediction direction. However, this is an example, and more or fewer directional prediction modes may be used depending on the settings. The intra-prediction unit 185 can also determine the prediction mode to be applied to the current block using the prediction modes applied to the surrounding blocks.

[0071] The interprediction unit 180 can derive a predicted block relative to the current block based on a reference block (reference sample array) identified by motion vectors on the reference picture. In this case, in order to reduce the amount of motion information transmitted in interprediction mode, motion information can be predicted in units of blocks, subblocks, or samples based on the correlation of motion information between the surrounding blocks and the current block. The motion information may include motion vectors and reference picture indices. The motion information may further include interprediction direction information (L0 prediction, L1 prediction, Bi prediction, etc.). In the case of interprediction, the surrounding blocks may include spatial neighboring blocks present in the current picture and temporal neighboring blocks present in the reference picture. The reference picture containing the reference block and the reference picture containing the temporal neighboring block may be the same or different from each other. The temporal neighboring block may be called a collocated reference block, colCU, etc. The reference picture containing the temporal neighboring block may be called a collocated picture (colPic). For example, the interpretation unit 180 can construct a motion information candidate list based on surrounding blocks and generate information indicating which candidate is used to derive the motion vector and / or reference picture index of the current block. Interpretation may be performed based on various prediction modes; for example, in skip mode and merge mode, the interpretation unit 180 can use the motion information of surrounding blocks as the motion information of the current block. In skip mode, unlike merge mode, the residual signal does not need to be transmitted.In motion vector prediction (MVP) mode, the motion vectors of surrounding blocks are used as motion vector predictors, and the motion vector of the current block can be signaled by encoding the motion vector difference and an indicator for the motion vector predictor. The motion vector difference represents the difference between the motion vector of the current block and the motion vector predictor.

[0072] The prediction unit can generate a prediction signal based on various prediction methods and / or prediction techniques described later. For example, the prediction unit may apply intra-prediction or inter-prediction to predict the current block, or it may apply intra-prediction and inter-prediction simultaneously. A prediction method that applies intra-prediction and inter-prediction simultaneously to predict the current block may be called CIIP (combined inter and intra prediction). The prediction unit can also perform intra-block copy (IBC) to predict the current block. Intra-block copy may be used, for example, for coding content images / videos such as games, as in SCC (screen content coding). IBC is a method of predicting the current block using a reference block that has already been restored in the current picture at a predetermined distance from the current block. When IBC is applied, the position of the reference block in the current picture may be encoded as a vector (block vector) corresponding to the predetermined distance. IBC basically performs prediction within the current picture, but it may be performed similarly to inter-prediction in that it derives the reference block within the current picture. In other words, IBC can use at least one of the interpretation methods described in this disclosure.

[0073] The predicted signal generated by the prediction unit may be used to generate a restored signal or a residual signal. The subtraction unit 115 can generate a residual signal (residual block, residual sample array) by subtracting the predicted signal output from the prediction unit (predicted block, predicted sample array) from the input video signal (original block, original sample array). The generated residual signal may be transmitted to the conversion unit 120.

[0074] The transformation unit 120 can generate transformation coefficients by applying a transformation method to the residual signal. For example, the transformation method may include at least one of the following: DCT (Discrete Cosine Transform), DST (Discrete Sine Transform), KLT (Karhunen-Loeve Transform), GBT (Graph-Based Transform), or CNT (Conditionally Non-linear Transform). Here, GBT refers to the transformation obtained from a graph when the relationship information between pixels is represented by this graph. CNT refers to the transformation obtained by generating a prediction signal using all previously reconstructed pixels and obtaining a transformation based on it. The transformation process may be applied to pixel blocks of the same size and square shape, or to blocks of a variable size instead of square shape.

[0075] The quantization unit 130 can quantize the conversion coefficients and transmit them to the entropy encoding unit 190. The entropy encoding unit 190 can encode the quantized signal (information about the quantized conversion coefficients) and output it as a bitstream. The information about the quantized conversion coefficients may be called residual information. The quantization unit 130 can rearrange the block-shaped quantized conversion coefficients into a one-dimensional vector form based on the coefficient scan order, and can also generate information about the quantized conversion coefficients based on the one-dimensional vector form of the quantized conversion coefficients.

[0076] The entropy encoding unit 190 can perform various encoding methods, such as exponential Golomb, CAVLC (context-adaptive variable length coding), and CABAC (context-adaptive binary arithmetic coding). In addition to the quantized conversion coefficients, the entropy encoding unit 190 can also encode information necessary for video / image restoration (e.g., the values ​​of syntax elements) together or separately. The encoded information (e.g., encoded video / image information) may be transmitted or stored in the form of a bitstream in units of NAL (network abstraction layer) units. The video / image information may further include information about various parameter sets, such as the adaptation parameter set (APS), picture parameter set (PPS), sequence parameter set (SPS), or video parameter set (VPS). The video / image information may also further include general constraint information. The signaling information, transmitted information, and / or syntax elements referred to in this disclosure may be encoded by the encoding procedure described above and included in the bitstream.

[0077] The bitstream may be transmitted over a network or stored on a digital storage medium. Here, the network may include broadcasting networks and / or communication networks, and the digital storage medium may include various storage media such as USB, SD, CD, DVD, Blu-ray, HDD, SSD, etc. A transmitting unit (not shown) for transmitting the signal output from the entropy encoding unit 190 and / or a storage unit (not shown) for storing it may be provided as an internal / external element of the video encoding device 100, or the transmitting unit may be provided as a component of the entropy encoding unit 190.

[0078] The quantized conversion coefficients output from the quantization unit 130 may be used to generate a resistive signal. For example, by applying inverse quantization and inverse transformation to the quantized conversion coefficients in the inverse quantization unit 140 and the inverse transformation unit 150, a resistive signal (residual block or resistive sample) can be reconstructed.

[0079] The adder 155 can generate a reconstructed signal (reconstructed picture, reconstructed block, reconstructed sample array) by adding the reconstructed residual signal to the predicted signal output from the inter-prediction unit 180 or the intra-prediction unit 185. When there is no residual for the block to be processed, such as when skip mode is applied, the predicted block may be used as the reconstructed block. The adder 155 may be called the reconstruction unit or the reconstructed block generation unit. The generated reconstructed signal may be used for intra-prediction of the next block to be processed in the current picture, or, as described later, may be used for inter-prediction of the next picture after filtering.

[0080] The filtering unit 160 can improve subjective / objective image quality by applying filtering to the restored signal. For example, the filtering unit 160 can apply various filtering methods to the restored picture to generate a modified restored picture, and store the modified restored picture in memory 170, specifically in the DPB of memory 170. The various filtering methods may include, for example, deblocking filtering, sample adaptive offset, adaptive loop filter, and bilateral filter. The filtering unit 160 can generate various filtering information and transmit it to the entropy encoding unit 190, as will be described later in the description of each filtering method. The filtering information may be encoded by the entropy encoding unit 190 and output in the form of a bitstream.

[0081] The corrected restored picture transmitted to memory 170 may be used as a reference picture in the interpretation unit 180. This allows the video encoding device 100 to avoid prediction mismatches between the video encoding device 100 and the video decoding device when interpretation is applied, and also improves encoding efficiency.

[0082] The DPB in memory 170 can store the modified restored picture for use as a reference picture in the inter-prediction unit 180. Memory 170 can store motion information of blocks from which motion information in the current picture has been derived (or encoded) and / or motion information of blocks in the picture that have already been restored. The stored motion information may be transmitted to the inter-prediction unit 180 for use as motion information of spatially surrounding blocks or motion information of temporally surrounding blocks. Memory 170 can store restored samples of restored blocks in the current picture and transmit them to the intra-prediction unit 185.

[0083] Overview of the video decoding device

[0084] Figure 3 is a schematic diagram showing an image decoding device to which the embodiments of this disclosure can be applied.

[0085] As shown in Figure 3, the video decoding device 200 may include an entropy decoding unit 210, an inverse quantization unit 220, an inverse transformation unit 230, an addition unit 235, a filtering unit 240, a memory 250, an inter-prediction unit 260, and an intra-prediction unit 265. The inter-prediction unit 260 and the intra-prediction unit 265 can be collectively referred to as the "prediction unit". The inverse quantization unit 220 and the inverse transformation unit 230 may be included in the residual processing unit.

[0086] All or at least some of the multiple components constituting the video decoding device 200 may be embodied as a single hardware component (e.g., a decoder or processor) depending on the embodiment. Furthermore, the memory 170 may include a DPB and may be embodied by a digital storage medium.

[0087] A video decoding device 200 that receives a bitstream containing video / image information can restore the image by performing a process corresponding to the process performed by the video encoding device 100 in Figure 2. For example, the video decoding device 200 can perform decoding using the processing unit applied in the video encoding device. Therefore, the decoding processing unit may be, for example, a coding unit. The coding unit may be a coding tree unit, or it may be obtained by dividing the largest coding unit. The restored video signal decoded and output by the video decoding device 200 may then be played back by a playback device (not shown).

[0088] The video decoding device 200 can receive the signal output from the video encoding device shown in Figure 2 in the form of a bitstream. The received signal may be decoded by the entropy decoding unit 210. For example, the entropy decoding unit 210 can parse the bitstream to derive information necessary for video restoration (or picture restoration) (e.g., video / image information). The video / image information may further include information about various parameter sets, such as an adaptation parameter set (APS), a picture parameter set (PPS), a sequence parameter set (SPS), or a video parameter set (VPS). The video / image information may also further include general constraint information. The video decoding device may further utilize the parameter set information and / or the general constraint information to decode the video. The signaling information, received information, and / or syntax elements referred to in this disclosure may be obtained from the bitstream by decoding through the decoding procedure. For example, the entropy decoding unit 210 can decode information in the bitstream based on a coding method such as exponential Golomb coding, CAVLC, or CABAC, and output the values ​​of syntax elements necessary for image restoration and the quantized values ​​of conversion coefficients related to the residual. More specifically, the CABAC entropy decoding method receives bins corresponding to each syntax element in the bitstream, determines a context model using the syntax element information to be decoded and the decoding information of the surrounding blocks and the block to be decoded, or the symbol / bin information decoded in a previous stage, predicts the probability of bin occurrence based on the determined context model, performs arithmetic decoding of the bins, and generates symbols corresponding to the values ​​of each syntax element. At this time, after determining the context model, the CABAC entropy decoding method can update the context model using the decoded symbol / bin information for the context model of the next symbol / bin.Information related to prediction from the information decoded by the entropy decoding unit 210 is provided to the prediction unit (inter-prediction unit 260 and intra-prediction unit 265), and residual values ​​that have been entropy decoded by the entropy decoding unit 210, i.e., quantized conversion coefficients and related parameter information, may be input to the inverse quantization unit 220. In addition, information related to filtering from the information decoded by the entropy decoding unit 210 may be provided to the filtering unit 240. On the other hand, a receiving unit (not shown) that receives signals output from the video encoding device may be further provided as an internal / external element of the video decoding device 200, or the receiving unit may be provided as a component of the entropy decoding unit 210.

[0089] On the other hand, the video decoding device according to this disclosure may be called a video / image / picture decoding device. The video decoding device may include an information decoder (video / image / picture information decoder) and / or a sample decoder (video / image / picture sample decoder). The information decoder may include an entropy decoding unit 210, and the sample decoder may include at least one of an inverse quantization unit 220, an inverse transformation unit 230, an addition unit 235, a filtering unit 240, a memory 250, an inter-prediction unit 260, and an intra-prediction unit 265.

[0090] The inverse quantization unit 220 can inverse quantize the quantized transformation coefficients and output the transformation coefficients. The inverse quantization unit 220 can rearrange the quantized transformation coefficients in a two-dimensional block form. In this case, the rearrangement may be performed based on the coefficient scan order performed by the video encoding device. The inverse quantization unit 220 can perform inverse quantization on the quantized transformation coefficients using quantization parameters (e.g., quantization step size information) and obtain the transformation coefficients.

[0091] The inverse conversion unit 230 can inversely convert the conversion coefficients to obtain residual signals (residual blocks, residual sample arrays).

[0092] The prediction unit can make predictions for the current block and generate a predicted block containing prediction samples for the current block. Based on the prediction information output from the entropy decoding unit 210, the prediction unit can determine whether intra-prediction or inter-prediction is applied to the current block and can determine a specific intra / inter-prediction mode (prediction method).

[0093] As mentioned in the description of the prediction unit of the video coding device 100, the prediction unit can generate prediction signals based on various prediction methods (techniques) described later.

[0094] The intra-prediction unit 265 can predict the current block by referring to the samples in the current picture. The description of the intra-prediction unit 185 may also apply to the intra-prediction unit 265.

[0095] The interprediction unit 260 can derive a predicted block for the current block based on a reference block (reference sample array) identified by motion vectors on a reference picture. In this case, in order to reduce the amount of motion information transmitted in interprediction mode, motion information can be predicted in block, subblock, or sample units based on the correlation of motion information between surrounding blocks and the current block. The motion information may include motion vectors and reference picture indices. The motion information may further include interprediction direction information (L0 prediction, L1 prediction, Bi prediction, etc.). In the case of interprediction, surrounding blocks may include spatial neighboring blocks present in the current picture and temporal neighboring blocks present in the reference picture. For example, the interprediction unit 260 can construct a motion information candidate list based on surrounding blocks and derive the motion vector and / or reference picture index of the current block based on the received candidate selection information. Interprediction may be performed based on various prediction modes (methods), and the prediction information may include information indicating the mode (method) of interprediction for the current block.

[0096] The adder 235 can generate a restored signal (restored picture, restored block, restored sample array) by adding the acquired residual signal to the predicted signal (predicted block, predicted sample array) output from the prediction unit (including the inter-prediction unit 260 and / or intra-prediction unit 265). When there is no residual for the block to be processed, such as when skip mode is applied, the predicted block may be used as the restored block. The description of the adder 155 may also apply to the adder 235. The adder 235 may be called the restorer unit or the restored block generation unit. The generated restored signal may be used for intra-prediction of the next block to be processed in the current picture, or, as described later, may be used for inter-prediction of the next picture after filtering.

[0097] The filtering unit 240 can improve subjective / objective image quality by applying filtering to the restored signal. For example, the filtering unit 240 can apply various filtering methods to the restored picture to generate a modified restored picture, and the modified restored picture can be stored in the memory 250, specifically in the DPB of the memory 250. The various filtering methods may include, for example, deblocking filtering, sample adaptive offset, adaptive loop filter, and bilateral filter.

[0098] The restored picture stored (modified) in the DPB of memory 250 may be used as a reference picture in the inter-prediction unit 260. Memory 250 can store motion information of blocks from which motion information in the current picture has been derived (or decoded) and / or motion information of blocks in the picture that have already been restored. The stored motion information can be transmitted to the inter-prediction unit 260 for use as motion information of spatially surrounding blocks or motion information of temporally surrounding blocks. Memory 250 can store restored samples of restored blocks in the current picture and transmit them to the intra-prediction unit 265.

[0099] In this specification, the embodiments described for the filtering unit 160, inter-prediction unit 180, and intra-prediction unit 185 of the video encoding device 100 may be applied identically or in a corresponding manner to the filtering unit 240, inter-prediction unit 260, and intra-prediction unit 265 of the video decoding device 200, respectively.

[0100] Neural network post-filter characteristics (NNPFC)

[0101] The combinations in Tables 1 to 3 represent the NNPFC syntax structure.

[0102] [Table 1]

[0103] [Table 2]

[0104] [Table 3]

[0105] The NNPFC syntax structures shown in Tables 1 to 3 may be signaled in the form of SEI (supplemental enhancement information) messages. SEI messages that signal the NNPFC syntax structures shown in Tables 1 to 3 can be called NNPFC SEI messages.

[0106] NNPFC SEI messages can identify neural networks available as post-processing filters. The use of identified post-processing filters for a specific picture can be indicated using neural-network post-filter activation (NNPFA) SEI messages. Here, "post-processing filter" and "post-filter" may have the same meaning.

[0107] To use such SEI messages, you may need to define variables like the following:

[0108] - The width and height of the input picture may be cropped in lumens, and these widths and heights can be represented by CroppedWidth and CroppedHeight, respectively.

[0109] - The lumens sample array of the input picture, CroppedYPic[idx], and the chromens sample arrays, CroppedCbPic[idx] and CroppedCrPic[idx], may be used as input to the NNPF if they exist, and the index idx may be in the range of 0 to numInputPics-1.

[0110] - BitDepth Y This can show the bit depth of the input picture relative to the lumens sample array.

[0111] - BitDepth C This can indicate the bit depth of the chroma sample array of the input picture, if these are present.

[0112] - ChromaFormatIdc can indicate a chroma format identifier.

[0113] - When the value of nnpfc_auxiliary_inp_idc is 1, the input picture filtering strength control value array StrengthControlVal[idx] should contain real numbers in the range of 0 to 1, and the index idx may have a range of 0 to numInputPics-1.

[0114] An input picture with index 0 may be a picture whose NNPF, defined by an NNPFC SEI message, has been activated by an NNPFA SEI message. An input picture whose index i is within the range of 1 to numInputPics-1 may take precedence over an input picture with index i-1 in the output order.

[0115] nnpfc_purpose can indicate the purpose of the NNPF as shown in Table 4. In Table 4, a non-zero (nnpfc_purpose & bitMask) indicates that the NNPF in Table 4 has a purpose related to the bitMask value. If nnpfc_purpose is greater than 0 and (nnpfc_purpose & bitMask) is the same as 0, the purpose related to the bitMask value does not have to be applicable to the NNPF. If nnpfc_purpose is the same as 0, the NNPF may be used at the discretion of the application. The value of nnpfc_purpose may be restricted to being within the range of 0 to 63 in the bitstream. Values ​​for nnpfc_purpose in the range of 64 to 65535 may be reserved for future use. The decoder must ignore NNPFC SEI messages with nnpfc_purpose in the range of 64 to 65535.

[0116] [Table 4]

[0117] The variables chromaUpsamplingFlag, resolutionResamplingFlag, pictureRateUpsamplingFlag, bitDepthUpsamplingFlag, and colourizationFlag, which indicate whether the purpose of NNPF indicated by nnpfc_purpose is chroma upsampling, resolutionResamplingFlag, pictureRateUpsamplingFlag, bitDepthUpsamplingFlag, and colourizationFlag, which indicate whether the purpose of NNPF indicated by nnpfc_purpose is colourization, may be derived as shown in Table 5 below.

[0118] [Table 5]

[0119] If ChromaFormatIdc is the same as 3, chromaUpsamplingFlag may be restricted to being the same as 0. If ChromaFormatIdc or chromaUpsamplingFlag is not the same as 0, colourizationFlag may be restricted to being the same as 0. If pictureRateUpsamplingFlag is the same as 1 and the input picture having index 0 is associated with a framepacking array SEI message with a value of fp_arrangement_type of 5, then all input pictures may be associated with a framepacking array SEI message with a value of fp_arrangement_type of 5 and the same value of fp_current_frame_is_frame0_flag.

[0120] nnpfc_id may contain an identification number that can be used to identify NNPF. The nnpfc_id value is between 0 and 2 32 It must be within the range of -2. The range is 256~511 and 2 31 ~2 32 -2 range nnpfc_id values ​​may be reserved for future use. Decoders are in the range of 256~511 or 2 31 ~2 32 NNPFC SEI messages with an nnpfc_id in the -2 range must be ignored.

[0121] If an NNPFC SEI message is currently the first NNPFC SEI message in the decoding sequence that has a specific nnpfc_id value within CLVS, the following may apply:

[0122] - The aforementioned SEI message can indicate base NNPF.

[0123] - The SEI message may be associated in output order with the currently decoded picture and all subsequent decoded pictures of the current layer until the CLVS finishes.

[0124] An NNPFC SEI message may be a repetition of a previous NNPFC SEI message in the CLVS in the decoding order, and the subsequent semantics may be applied as if this SEI message were the only NNPFC SEI message in the CLVS that has the same content.

[0125] A value of 0 for nnpfc_mode_idc indicates that the SEI message contains a bitstream representing the basic NNPF, or that it represents an update related to the basic NNPF having the same nnpfc_id value.

[0126] If an NNPFC SEI message is the first NNPFC SEI message in a decoding sequence currently having a specific nnpfc_id value within the CLVS, a value of 1 for nnpfc_mode_idc may indicate that the basic NNPF associated with the nnpfc_id value is a neural network, and the neural network may be identified by a URI represented by nnpfc_uri using the format identified by the tag URI nnpfc_tag_uri.

[0127] If an NNPFC SEI message is neither the first NNPFC SEI message in a decoding sequence currently having a specific nnpfc_id value within the CLVS, nor a repetition of the first NNPFC SEI message, then a value of 1 for nnpfc_mode_idc may indicate that an update related to the underlying NNPF having the same nnpfc_id value is defined by a URI represented by nnpfc_uri using the tag URI nnpfc_tag_uri.

[0128] A value of 1 for nnpfc_base_flag indicates that the SEI message is a basic NNPF. A value of 0 for nnpfc_base_flag indicates that the SEI message is an update related to a basic NNPF.

[0129] The following restrictions may apply to the value of nnpfc_base_flag:

[0130] - If an NNPFC SEI message is the first NNPFC SEI message in the CLVS that currently has a specific nnpfc_id value in the decoding order, the value of nnpfc_base_flag must be the same as 1.

[0131] - If an NNPFC SEI message nnpfcB is not the first NNPFC SEI message in the CLVS currently having a specific nnpfc_id value in the decoding order, and the value of nnpfc_base_flag is the same as 1, then the NNPFC SEI message may be a repetition of the first NNPFC SEI message nnpfcA in the decoding order that has the same nnpfc_id. That is, the payload condensates of nnpfcB must be the same as the payload condensates of nnpfcA.

[0132] If the value of nnpfc_base_flag is 0, the following restrictions may apply:

[0133] - This SEI message can define updates associated with preceding base NNPFs in the decoding order using the same nnpfc_id value. Updates are not cumulative, but each update may be applied to the base NNPF which is the NNPF specified in the first NNPFC SEI message in the decoding order that currently has a particular nnpfc_id value in CLVS. The NNPF defined by this SEI message may be obtained by applying updates defined by SEI messages associated with base NNPFs that have the same nnpfc_id value.

[0134] - This SEI message may relate to the currently decoded picture and all subsequent decoded pictures up to the end of the current CLVS in the output order of the current layer. Here, decoded pictures that follow the currently decoded picture in the output order within the current CLVS may be excluded. This SEI message may relate to subsequent NNPFC SEI messages in the decoding order where nnpfc_base_flag is 0 and has an earlier value among the specific nnpfc_id values ​​within the current CLVS.

[0135] A value of 0 for nnpfc_mode_idc can indicate that the SEI message contains a bitstream indicating a base NNPF (if the value of nnpfc_base_flag is 1), or that it is an update to a base NNPF having the same nnpfc_id value (if the value of nnpfc_base_flag is 0). If the value of nnpfc_base_flag is 1, an nnpfc_mode_idc identical to 1 can indicate that the base NNPF associated with the nnpfc_id value is related to a neural network identified by a URI, where the URI may be indicated by an nnpfc_uri in the format indicated by a tag URI nnpfc_tag_uri. If the value of nnpfc_base_flag is 0, an nnpfc_mode_idc identical to 1 can indicate that an update related to a base NNPF with the same nnpfc_id value is defined by a URI, where the URI may be indicated by an nnpfc_uri in the format indicated by a tag URI nnpfc_tag_uri.

[0136] The value of nnpfc_mode_idc may be restricted to being in the range of 0 to 1 in the bitstream. Values ​​in the range of 2 to 255 for nnpfc_mode_idc may be reserved for future use and do not need to be present in the bitstream. The decoder must ignore NNPFC SEI messages with nnpfc_mode_idc in the range of 2 to 255. Values ​​of nnpfc_mode_idc greater than 255 do not need to be present in the bitstream and do not need to be reserved for future use.

[0137] nnpfc_reserved_zero_bit_a may be restricted to have the same value as 0 by bitstream restrictions. The decoder may be restricted to ignore NNPFC SEI messages where the value of nnpfc_reserved_zero_bit_a is not 0.

[0138] The nnpfc_tag_uri may contain a tag URI having syntax and semantics specified in IETF RFC 4151 that identifies the neural network used as the base NNPF or an update to the base NNPF using the nnpfc_id value identified by the nnpfc_uri. Using nnpfc_tag_uri, the format of the neural network data specified by nnrpf_uri can be uniquely identified without a central registration authority. The same nnpfc_tag_uri as "tag:iso.org,2023:15938-17" can indicate that the neural network data identified by nnpfc_uri complies with ISO / IEC 15938-17.

[0139] nnpfc_uri may contain a URI having syntax and semantics specified in IETF Internet Standard 66 that identifies a neural network used as a base NNPF or an update associated with a base NNPF that uses the same nnpfc_id value.

[0140] A value of 1 for nnpfc_property_present_flag can indicate the presence of syntax elements related to filtering purposes, input formatting, output formatting, and complexity. A value of 0 for nnpfc_property_present_flag can indicate the absence of syntax elements related to filtering purposes, input formatting, output formatting, and complexity. If the value of nnpfc_base_flag is 1, nnpfc_property_present_flag may be restricted to having a value of 1. If the value of nnpfc_property_present_flag is 0, the values ​​of all syntax elements that would only exist if the value of nnpfc_property_present_flag is 1 may be inferred to be identical to the values ​​of their corresponding syntax elements within the NNPFC SEI message containing the base NNPF from which the SEI message provides updates.

[0141] The following restrictions may apply if an NNPFC SEI message nnpfcCurr is not the first NNPFC SEI message in the CLVS that currently has a specific nnpfc_id value in the decoding order, nor is it an iteration of the first NNPFC SEI message that has a specific nnpfc_id value (i.e., the value of nnpfc_base_flag is 0), and the value of nnpfc_property_present_flag is 1.

[0142] - The value of nnpfc_purpose in an NNPFC SEI message must be identical to the value of nnpfc_purpose in the first NNPFC SEI message that currently has a specific nnpfc_id value in the CLVS in the decoding order.

[0143] - The values ​​of the syntax elements nnpfc_base_flag and preceding nnpfc_complexity_info_present_flag within an NNPFC SEI message must be identical to the values ​​of the corresponding syntax elements in the first NNPFC SEI message that currently has a specific nnpfc_id value in the CLVS in the decoding order.

[0144] - In the decoding order, the nnpfc_complexity_info_present_flag in the first NNPFC SEI message that currently has a specific nnpfc_id value in CLVS must be equal to 0, or both must be equal to 1, and the following may apply:

[0145] (1) The nnpfc_parameter_parameter_type_idc in nnpfcCurr must be identical to the nnpfc_parameter_parameter_type_idc in nnpfcBase.

[0146] (2) If nnpfc_log2_parameter_bit_length_minus3 exists in nnpfcCurr, then nnpfc_log2_parameter_bit_length_minus3 in nnpfcCurr must be less than or equal to nnpfc_log2_parameter_bit_length_minus3 in nnpfcBase.

[0147] (3) If nnpfc_num_parameters_idc in nnpfcBase is the same as 0, then nnpfc_num_parameters_idc in nnpfcCurr must also be the same as 0.

[0148] (4) Otherwise (if nnpfc_num_parameters_idc in nnpfcBase is greater than 0), then nnpfc_num_parameters_idc in nnpfcCurr must be greater than 0 or less than or equal to nnpfc_num_parameters_idc in nnpfcBase.

[0149] (5) If nnpfc_num_kmac_operations_idc in nnpfcBase is the same as 0, then nnpfc_num_kmac_operations_idc in nnpfcCurr must also be the same as 0.

[0150] (6) If not (where nnpfc_num_kmac_operations_idc in nnpfcBase is greater than 0), then nnpfc_num_kmac_operations_idc in nnpfcCurr must be greater than 0 and less than or equal to nnpfc_num_kmac_operations_idc in nnpfcBase.

[0151] (7) If nnpfc_total_kilobyte_size in nnpfcBase is equal to 0, then nnpfc_total_kilobyte_size in nnpfcCurr must also be equal to 0.

[0152] (8) Otherwise (where nnpfc_total_kilobyte_size in nnpfcBase is greater than 0), nnpfc_total_kilobyte_size in nnpfcCurr must be greater than 0 or less than or equal to nnpfc_total_kilobyte_size in nnpfcBase.

[0153] nnpfc_num_input_pics_minus1+1 can indicate the number of decoded output pictures used as input to NNPF. The value of nnpfc_num_input_pics_minus1 may be restricted to be within the range of 0 to 63. If the value of pictureRateUpsamplingFlag is 1, the value of nnpfc_num_input_pics_minus1 may be restricted to be greater than 0.

[0154] The variable numInputPics, which indicates the number of pictures used as input to NNPF, can be derived as shown in Equation 1.

[0155]

number

[0156] A value of 1 for nnpfc_input_pic_output_flag[i] indicates that NNPF will generate a corresponding output picture for the i-th input picture. A value of 0 for nnpfc_input_pic_output_flag[i] indicates that NNPF will not generate a corresponding output picture for the i-th input picture. If the value of nnpfc_num_input_pics_minus1 is 0, then the value of nnpfc_input_pic_output_flag[0] may be inferred to be 1. If the value of pictureRateUpsamplingFlag is the same as 0 and the value of nnpfc_num_input_pics_minus1 is greater than 0, then nnpfc_input_pic_output_flag[i] may be restricted to being the same as 1 for at least one value of i in the range 0 to nnpfc_num_input_pics_minus1. nnpfc_input_pic_output_flag[i] may be called nnpfc_input_pic_filtering_flag[i].

[0157] A value of 1 for nnpfc_absent_input_pic_zero_flag indicates that NNPF expects input pictures that are not present in the bitstream to be represented by a sample array with sample values ​​of 0. A value of 0 for nnpfc_absent_input_pic_zero_flag indicates that NNPF expects input pictures that are not present in the bitstream to be represented by the input picture that is closest to it in output order within the bitstream.

[0158] The `nnpfc_out_sub_c_flag` can indicate the values ​​of the variables `outSubWidthC` and `outSubHeightC` when the value of `chromaUpsamplingFlag` is 0. A value of 1 for `nnpfc_out_sub_c_flag` indicates that the value of `outSubWidthC` is 1 and the value of `outSubHeightC` is 1. A value of 0 for `nnpfc_out_sub_c_flag` indicates that the value of `outSubWidthC` is 2 and the value of `outSubHeightC` is 1. If the value of `ChromaFormatIdc` is 2 and `nnpfc_out_sub_c_flag` exists, the value of `nnpfc_out_sub_c_flag` must be the same as 1.

[0159] nnpfc_out_colour_format_idc can indicate the color format of the NNPF output and the resulting values ​​of the variables outSubWidthC and outSubHeightC when the value of colourizationFlag is 1. A value of 1 for nnpfc_out_colour_format_idc indicates that the NNPF output color format is 4:2:0, and both outSubWidthC and outSubHeightC are the same as 2. A value of 2 for nnpfc_out_colour_format_idc indicates that the NNPF output color format is 4:2:2, outSubWidthC is 2, and outSubHeightC is 1. A value of 3 for nnpfc_out_colour_format_idc indicates that the NNPF output color format is 4:4:4, and both outSubWidthC and outSubHeightC are 1. The value of nnpfc_out_colour_format_idc may be restricted to not be the same as 0. If both chromaUpsamplingFlag and colourizationFlag are the same as 0, then outSubWidthC and outSubHeightC can be inferred to be the same as SubWidthC and SubHeightC, respectively.

[0160] nnpfc_pic_width_num_minus1+1 and nnpfc_pic_width_denom_minus1+1 can represent the numerator and denominator, respectively, of the resampling ratio of the NNPF output picture width relative to CroppedWidth. The value obtained by dividing (nnpfc_pic_width_num_minus1+1) by (nnpfc_pic_width_denom_minus1+1) must be within the range of 1 / 16 to 1 / 16. If nnpfc_pic_width_num_minus1 and nnpfc_pic_width_denom_minus1 do not exist, they may both be inferred to be 0.

[0161] The variable nnpfcOutputPicWidth indicates the width of the lumens sample array of the picture corresponding to the result of applying the NNPF identified by nnpfc_id to the input picture, and may be derived as shown in Equation 2.

[0162]

number

[0163] The remainder obtained by dividing nnpfcOutputPicWidth by outSubWidthC must be equal to 0.

[0164] nnpfc_pic_height_num_minus1+1 and nnpfc_pic_height_denom_minus1+1 can represent the numerator and denominator, respectively, of the resampling ratio of the NNPF output picture height relative to CroppedHeight. The value obtained by dividing (nnpfc_pic_height_num_minus1+1) by (nnpfc_pic_height_denom_minus1+1) must be within the range of 1 / 16 to 1 / 16. If nnpfc_pic_height_num_minus1 and nnpfc_pic_height_denom_minus1 do not exist, they may both be inferred to be 0.

[0165] The variable nnpfcOutputPicHeight indicates the height of the lumens sample array of the picture corresponding to the result of applying the NNPF identified by nnpfc_id to the input picture, and may be derived as shown in Equation 3.

[0166]

number

[0167] The remainder obtained by dividing nnpfcOutputPicHeight by outSubHeightC must be equal to 0.

[0168] If nnpfc_pic_width_num_minus1, nnpfc_pic_width_denom_minus1, nnpfc_pic_height_num_minus1, and nnpfc_pic_height_denom_minus1 exist, then at least one of the following restrictions may be true:

[0169] - The value of nnpfcOutputPicWidth is not the same as CroppedWidth.

[0170] - The value of nnpfcOutputPicHeight is not the same as CroppedHeight.

[0171] nnpfc_interpolated_pics[i] can indicate the number of interpolated pictures generated by NNPF between the i-th picture and the (i+1)-th picture used as input to NNPF. The value of nnpfc_interpolated_pics[i] may be restricted to the range of 0 to 63. The value of nnpfc_interpolated_pics[i] may be restricted to being greater than 0 for at least one i in the range of 0 to nnpfc_num_input_pics_minus1-1.

[0172] The variables NumInpPicsInOutputTensor, which indicates the number of pictures that have a corresponding input picture and exist in the NNPF output tensor; InpIdx[idx], which indicates the input picture index of the idx-th picture that has a corresponding input picture and exists in the NNPF output tensor; and numOutputPics, which indicates the total number of pictures in the NNPF output tensor, may be derived as shown in Table 6.

[0173] [Table 6]

[0174] A value of 1 for nnpfc_component_last_flag indicates that the last dimension of the input tensor for NNPF and the output tensor, outputTensor (the result of NNPF), are currently used for the channel. A value of 0 for nnpfc_component_last_flag indicates that the third dimension of the input tensor for NNPF and the output tensor, outputTensor (the result of NNPF), are currently used for the channel. The first dimensions of the input and output tensors may be used as a batch index, as used in some neural network frameworks. The formula in the semantics of this SEI message uses a batch size corresponding to a batch index such as 0, but the batch size used as input for neural network inference may be determined by the implementation of post-processing. For example, when the value of nnpfc_inp_order_idc is the same as 3 and the value of nnpfc_auxiliary_inp_idc is the same as 1, the input tensor may have 7 channels, including 4 lumer matrices, 2 chroma matrices, and 1 auxiliary input matrix. In this case, the DeriveInputTensors() process can induce each of the seven channels of the input tensor one by one, and when a particular channel is being processed, that channel may be called the current channel during the process.

[0175] nnpfc_inp_format_idc can indicate how to convert the sample values ​​of the input picture into NNPF input values. When the value of nnpfc_inp_format_idc is 1, the NNPF input values ​​may be real numbers, and the functions InpY() and InpC() can be expressed as shown in Equation 4.

[0176]

number

[0177] If the value of nnpfc_inp_format_idc is 1, the input values of NNPF are unsigned integer numbers, and the functions InpY() and InpC() may be derived as shown in Table 7.

[0178] [Table 7]

[0179] Variable inpTensorBitDepth Y may be derived from the syntax element nnpfc_inp_tensor_luma_bitdepth_minus8 described below. inpTensorBitDepth C may be derived from the syntax element nnpfc_inp_tensor_chroma_bitdepth_minus8 described below. Values of nnpfc_inp_format_idc greater than 1 may be reserved for future use and shall not be present in the bitstream. The decoder shall ignore NNPFC SEI messages containing reserved values of nnpfc_inp_format_idc.

[0180] A value of nnpfc_auxiliary_inp_idc greater than 0 indicates the presence of auxiliary input data in the NNPF input tensor. A value of nnpfc_auxiliary_inp_idc of 0 indicates the absence of auxiliary input data in the input tensor. A value of nnpfc_auxiliary_inp_idc of 1 indicates that the auxiliary input data is induced in the manner shown in Tables 10 to 12. The value of nnpfc_auxiliary_inp_idc must be in the range of 0 to 1 in the bitstream. Values ​​of nnpfc_auxiliary_inp_idc from 2 to 255 may be reserved for future use and do not exist in the bitstream. The decoder must ignore NNPFC SEI messages with nnpfc_auxiliary_inp_idc in the range of 2 to 255. Values ​​of nnpfc_auxiliary_inp_idc greater than 255 do not exist in the bitstream and are not reserved for future use.

[0181] nnpfc_inp_order_idc indicates how the sample array of input pictures is aligned to form the input tensor for NNPF. The value of nnpfc_inp_order_idc must be in the range of 0 to 3 in the bitstream. Values ​​of nnpfc_inp_order_idc from 4 to 255 may be reserved for future use and do not exist in the bitstream. The decoder must ignore NNPFC SEI messages with nnpfc_inp_order_idc in the range of 4 to 255. Values ​​of nnpfc_inp_order_idc greater than 255 do not exist in the bitstream and are not reserved for future use. The value of nnpfc_inp_order_idc must not be 3 if the value of ChromaFormatIdc is not 1. The value of nnpfc_inp_order_idc must be 3 if the value of ChromaFormatIdc is 0. If the value of chromaUpsamplingFlag is 1, the value of nnpfc_inp_order_idc must not be 0.

[0182] Table 8 provides an explanation of the nnpfc_inp_order_idc value.

[0183] [Table 8]

[0184] nnpfc_inp_tensor_luma_bitdepth_minus8+8 can indicate the bit depth of the luma sample values ​​in the input integer tensor. Y This can be derived as shown in equation 5.

[0185]

number

[0186] The value of nnpfc_inp_tensor_luma_bitlength_minus8 may be restricted to lie within the range of 0 to 24.

[0187] nnpfc_inp_tensor_chroma_bitdepth_minus8+8 can represent the bit depth of the chroma sample values ​​in the input integer tensor. The value of inpTensorBitDepthC may be derived as shown in equation 6.

[0188]

number

[0189] The value of nnpfc_inp_tensor_chroma_bitdepth_minus8 may be restricted to be within the range of 0 to 24.

[0190] When the value of nnpfc_auxiliary_inp_idc is the same as 1, the variable strengthControlScaledVal may be derived as shown in Table 9.

[0191] [Table 9]

[0192] The patch may be a rectangular array of samples from the components of the picture (e.g., luma or chroma components).

[0193] The process DeriveInputTensors() for deriving the input tensor inputTensor with respect to the given vertical sample coordinate cTop and horizontal sample coordinate cLeft indicating the top-left sample position of the sample patch included in the input tensor can be shown as the combination of Tables 10 to 12.

[0194] [Table 10]

[0195] [Table 11]

[0196] [Table 12]

[0197] The value 0 of nnpfc_out_format_idc can indicate that for the bit depth bitDepth required for subsequent post-processing or display, the sample values output by the NNPF are real numbers that are linearly mapped from the value range of 0 to 1 to the unsigned integer value range of 0 to (1 << bitDepth) - 1. The value 1 of nnpfc_out_format_idc means that the luma sample values output by the NNPF are from 0 to (1 << outTensorBitDepth YIt can be shown that the chroma sample values ​​output by NNPF are unsigned integers in the range of )-1, and the chroma sample values ​​output by NNPF are 0~(1<<(1< <outTensorBitDepth C It can be indicated that it is an unsigned integer in the range of )-1. Values ​​of nnpfc_out_format_idc greater than 1 may be reserved for future use and do not exist in the bitstream. The decoder must ignore NNPFC SEI messages that contain reserved values ​​for nnpfc_out_format_idc.

[0198] nnpfc_out_order_idc can indicate the output order of samples output from NNPF. The value of nnpfc_out_order_idc must be in the range of 0 to 3 in the bitstream. Values ​​of nnpfc_out_order_idc from 4 to 255 may be reserved for future use and do not exist in the bitstream. The decoder must ignore NNPFC SEI messages with nnpfc_out_order_idc in the range of 4 to 255. Values ​​of nnpfc_out_order_idc greater than 255 do not exist in the bitstream and are not reserved for future use. If the value of chromaUpsamplingFlag is 1, the value of nnpfc_out_order_idc must not be the same as 0 or 3. If the value of colourizationFlag is 1, the value of nnpfc_out_order_idc must not be the same as 0.

[0199] Table 13 provides an explanation of the values ​​for nnpfc_out_order_idc.

[0200] [Table 13]

[0201] nnpfc_out_tensor_luma_bitdepth_minus8+8 can indicate the bit depth of the luma sample values ​​in the output integer tensor. The value of nnpfc_out_tensor_luma_bitdepth_minus8 must be in the range of 0 to 24. outTensorBitDepth Y The value of can be derived as shown in equation 7.

[0202]

number

[0203] nnpfc_out_tensor_chroma_bitdepth_minus8+8 can indicate the bit depth of the chroma sample values ​​in the output integer tensor. The value of nnpfc_out_tensor_chroma_bitdepth_minus8 must be in the range of 0 to 24. outTensorBitDepth C The value of can be derived as shown in equation 8.

[0204]

number

[0205] When the value of bitDepthUpsamplingFlag is 1, the value of nnpfc_out_format_idc must be the same as 1, and at least one of the following restrictions may be true:

[0206] - nnpfc_out_tensor_luma_bitdepth_minus8+8 exists, and outTensorBitDepth Y BitDepth Y bigger

[0207] - nnpfc_out_tensor_chroma_bitdepth_minus8+8 exists, and outTensorBitDepth CBitDepth C bigger

[0208] nnpfc_inp_tensor_luma_bitdepth_minus8, nnpfc_inp_tensor_chroma_bitdepth_minus8, nnpfc_out_tensor_luma_bitdepth_minus8, and nnpfc_out_tensor_chroma_bitdepth_minus8 exist, and outTensorBitDepth Y is inpTensorBitDepth Y If it is larger, outTensorBitDepth C is inpTensorBitDepth C It must not be smaller than nnpfc_inp_tensor_luma_bitdepth_minus8, nnpfc_inp_tensor_chroma_bitdepth_minus8, nnpfc_out_tensor_luma_bitdepth_minus8, and nnpfc_out_tensor_chroma_bitdepth_minus8, and outTensorBitDepth C is inpTensorBitDepth C If it is larger, outTensorBitDepth Y is inpTensorBitDepth Y It must not become any smaller.

[0209] The StoreOutputTensors() process, which derives sample values ​​in FilteredYPic, FilteredCbPic, and FilteredCrPic—sample arrays filtered from the output tensor for given vertical sample coordinates cTop and horizontal sample coordinates cLeft, indicating the top-left sample position of the sample patch contained in the input tensor—can be shown as a join in Tables 14 and 15.

[0210] [Table 14]

[0211] [Table 15]

[0212] A value of 1 for nnpfc_separate_colour_description_present_flag indicates that the distinctive combination of color primaries, transformation properties, matrix coefficients, scaling, and offset values ​​for a picture by NNPF is specified in the SEI message syntax structure. A value of 0 for nnfpc_separate_colour_description_present_flag indicates that the combination of color primaries, transformation properties, matrix coefficients, scaling, and offset values ​​for a picture by NNPF is identical to that displayed in the VUI parameters for CLVS.

[0213] nnpfc_colour_primaries may have the same semantics as defined for the vui_colour_primaries syntax element, except as follows:

[0214] - nnpfc_colour_primaries can indicate the primary colors of a picture that appear as a result of applying the NNPF specified in the SEI message, rather than the primary colors used in CLVS.

[0215] - If nnpfc_colour_primaries is not present in the NNPFC SEI message, the value of nnpfc_colour_primaries may be inferred to be the same as the value of vui_colour_primaries.

[0216] nnpfc_transfer_characteristics may have the same semantics as defined for the vui_transfer_characteristics syntax element, except as follows:

[0217] - nnpfc_transfer_characteristics can indicate the transformation characteristics of the picture that appear as a result of applying the NNPF specified in the SEI message, rather than the transformation characteristics used in CLVS.

[0218] - If nnpfc_transfer_characteristics is not present in the NNPFC SEI message, the value of nnpfc_transfer_characteristics may be inferred to be the same as the value of vui_transfer_characteristics.

[0219] nnpfc_matrix_coeffs can describe the equations used to derive lumens and chromens signals from green, blue, and red, or Y, Z, and X primary signals. The semantics of nnpfc_matrix_coeffs may apply to the picture that appears as a result of applying the specified NNPF to the SEI message, as shown for MatrixCoefficients, and BitDepth Y and BitDepth C Each of these is outTensorBitDepth Y and outTensorBitDepth C It may be identical to vui_matrix_coeffs. If nnpfc_matrix_coeffs does not exist in the NNPFC SEI message, the value of nnpfc_matrix_coeffs may be inferred to be identical to vui_matrix_coeffs.

[0220] nnpfc_matrix_coeffs must not be equal to 0, except under the following conditions:

[0221] - nnpfc_out_tensor_chroma_bitdepth_minus8 is identical to nnpfc_out_tensor_luma_bitdepth_minus8.

[0222] - nnpfc_out_order_idc is the same as 2, outSubHeightC is the same as 1, and outSubWidthC is the same as 1.

[0223] nnpfc_matrix_coeffs must not be identical to 8 unless one of the following conditions is met:

[0224] - nnpfc_out_tensor_chroma_bitdepth_minus8 is identical to nnpfc_out_tensor_luma_bitdepth_minus8.

[0225] - nnpfc_out_tensor_chroma_bitdepth_minus8 is the same as nnpfc_out_tensor_luma_bitdepth_minus8+1, nnpfc_out_order_idc is the same as 2, outSubHeightC is the same as 2, and outSubWidthC is the same as 1.

[0226] nnpfc_full_range_flag can indicate the scaling and offset values ​​applied in relation to the matrix coefficients, identical to those identified by nnpfc_matrix_coeffs. The semantics of nnpfc_full_range_flag may be identical to those identified for VideoFullRangeFlag. If nnpfc_full_range_flag does not exist, its value may be inferred to be identical to 0.

[0227] A value of 1 for nnpfc_chroma_loc_info_present_flag indicates the presence of the nnpfc_chroma_sample_loc_type_frame syntax element in the NNPFC SEI message. A value of 0 for nnpfc_chroma_loc_info_present_flag indicates the absence of the nnpfc_chroma_sample_loc_type_frame syntax element in the NNPFC SEI message. If the value of colourizationFlag is 0 or nnpfc_out_colour_format_idc is not 1, the value of nnpfc_chroma_loc_info_present_flag may be restricted to the same as 0.

[0228] If nnpfc_chroma_sample_loc_type_frame is not equal to 6 and nnpfc_out_colour_format_idc is equal to 1, then nnpfc_chroma_sample_loc_type_frame can indicate the position of the chroma sample in the output picture. If nnpfc_chroma_sample_loc_type_frame is equal to 6 and nnpfc_out_colour_format_idc is equal to 1, then it can indicate that the position of the chroma sample is unknown, not specified, or specified in another way. The value of nnpfc_chroma_sample_loc_type_frame must be within the range of 0 to 6.

[0229] nnpfc_overlap can indicate the number of horizontal and vertical overlapping samples of adjacent input tensors in NNPF. The value of nnpfc_overlap must be within the range of 0 to 16383.

[0230] A value of 1 for nnpfc_constant_patch_size_flag indicates that NNPF accepts the exact patch size specified by nnpfc_patch_width_minus1 and nnpfc_patch_height_minus1 as input. A value of 0 for nnpfc_constant_patch_size_flag indicates that NNPF accepts any patch size with width inpPatchWidth and height inpPatchHeight as input. Here, the width of the extended patch (i.e., the patch plus the overlapping area) is the same as inpPatchWidth+2*nnpfc_overlap, and the height of the extended patch is the same as inpPatchHeight+2*nnpfc_overlap, and the height of the extended patch is the same as nnpfc_extended_patch_height_cd_delta_minus1+1+2*nnpfc_overlap, and the height of the extended patch is the same as inpPatchHeight+2*nnpfc_overlap, and the height of the extended patch is the same as nnpfc_extended_patch_height_cd_delta_minus1+1+2*nnpfc_overlap.

[0231] npfc_patch_width_minus1+1 can indicate the number of horizontal samples required for the patch size input to NNPF when the value of nnpfc_constant_patch_size_flag is 1. The value of nnpfc_patch_width_minus1 must be within the range of 0 to Min(32766,CroppedWidth-1).

[0232] npfc_patch_height_minus1+1 can indicate the number of vertical samples required for the patch size input to NNPF when the value of nnpfc_constant_patch_size_flag is 1. The value of nnpfc_patch_height_minus1 must be in the range of 0 to Min(32766,CroppedHeight-1).

[0233] nnpfc_extended_patch_width_cd_delta_minus1+1+2*nnpfc_overlap can represent the common divisor of the allowed width of the extended patch required for input to NNPF when the value of nnpfc_constant_patch_size_flag is 0. The value of nnpfc_extended_patch_width_cd_delta_minus1 must be in the range of 0 to Min(32766,CroppedWidth-1).

[0234] nnpfc_extended_patch_height_cd_delta_minus1+1+2*nnpfc_overlap can represent the common divisor of the allowed extended patch height required for input to NNPF when the value of nnpfc_constant_patch_size_flag is 0. The value of nnpfc_extended_patch_height_cd_delta_minus1 must be in the range of 0 to Min(32766,CroppedHeight-1).

[0235] The variables inpPatchWidth and inpPatchHeight may be set to the patch size width and patch size height, respectively.

[0236] If the value of nnpfc_constant_patch_size_flag is 0, the following may be applied.

[0237] - The values ​​of inpPatchWidth and inpPatchHeight may be provided by an external means or set by a post-processor.

[0238] - The value of inpPatchWidth + 2 * nnpfc_overlap must be a positive integer multiple of nnpfc_extended_patch_width_cd_delta_minus1 + 1 + 2 * nnpfc_overlap, and inpPatchWidth must be less than or equal to CroppedWidth. The value of inpPatchHeight + 2 * nnpfc_overlap must be a positive integer multiple of nnpfc_extended_patch_height_cd_delta_minus1 + 1 + 2 * nnpfc_overlap, and inpPatchHeight must be less than or equal to CroppedHeight.

[0239] Otherwise, (if the value of nnpfc_constant_patch_size_flag is 1), the value of inpPatchWidth may be set to the same as nnpfc_patch_width_minus1+1, and the value of inpPatchHeight may be set to the same as nnpfc_patch_height_minus1+1.

[0240] The variables outPatchWidth, outPatchHeight, horCScaling, verCScaling, outPatchCWidth, and outPatchCHeight may be derived as shown in Table 16.

[0241] [Table 16]

[0242] outPatchWidth*CroppedWidth must be the same as nnpfcOutputPicWidth*inpPatchWidth, and outPatchHeight*CroppedHeight must be the same as nnpfcOutputPicHeight*inpPatchHeight.

[0243] As described in Table 17, nnpfc_padding_type can indicate the padding process when referring to sample positions outside the boundaries of the input picture. The value of nnpfc_padding_type must exist within the range of 0 to 4. Values from 5 to 15 for nnpfc_padding_type are reserved for future use and shall not exist in the bitstream. The decoder must ignore NNPFC SEI messages having nnpfc_padding_type in the range of 5 to 15. Values of nnpfc_padding_type exceeding 15 shall not exist in the bitstream and are not reserved for future use.

[0244]

Table 17

[0245] When the value of nnpfc_padding_type is 4, nnpfc_luma_padding_val can indicate the luma value used for padding. The value of nnpfc_luma_padding_val must exist within the range of 0 to (1<<BitDepthY)-1.

[0246] When the value of nnpfc_padding_type is 4, nnpfc_cb_padding_val can indicate the Cb value used for padding. The value of nnpfc_cb_padding_val must exist within the range of 0 to (1<<BitDepthC)-1.

[0247] When the value of nnpfc_padding_type is 4, nnpfc_cr_padding_val can indicate the Cr value used for padding. The value of nnpfc_cr_padding_val must exist within the range of 0 to (1<<BitDepthC)-1.

[0248] The function InpSampleVal(y, x, picHeight, picWidth, croppedPic, cIdx), which has the inputs vertical sample position y, horizontal sample position x, picture height picHeight, picture width picWidth, sample array CroppedPic, and common index cIdx (0 for luma, 1 for Cb, and 2 for Cr), can return the derived SampleVal value as shown in Table 18.

[0249] [Table 18]

[0250] The NNPF PostProcessingFilter() may be a target NNPF derived from the semantics of the NNPFA SEI message.

[0251] The processes in Table 19 may be used to filter in a patch manner using NNPF PostProcessingFilter() and generate filtered and / or interpolated pictures, which may include a Y sample array FilteredYPic, a Cb sample array FilteredCbPic, and a Cr sample array FilteredCrPic, as shown by nnpfc_out_order_idc.

[0252] [Table 19]

[0253] An NNPF-generated picture having index i may include the sample arrays FilteredYPic[i], FilteredCbPic[i], and FilteredCrPic[i], if present. The NNPF-generated picture does not need to include overlapping regions.

[0254] The NNPF process is configured to output NNPF-generated pictures in ascending order of index, as defined in Table 19. Here, all NNPF-generated pictures interpolated by NNPF are output, and all NNPF-generated pictures corresponding to the pictures input to NNPF are output, as specified in the semantics of the NNPFA SEI message.

[0255] A value of 1 for nnpfc_complexity_info_present_flag indicates that there is one or more syntax elements that indicate the complexity of the NNPF associated with the nnpfc_id. A value of 0 for nnpfc_complexity_info_present_flag indicates that there are no syntax elements that indicate the complexity of the NNPF associated with the nnpfc_id.

[0256] A value of 0 for nnpfc_parameter_type_idc can indicate that the neural network uses only integer parameters. A value of 1 for nnpfc_parameter_type_flag can indicate that the neural network can use floating-point or integer parameters. A value of 2 for nnpfc_parameter_type_idc can indicate that the neural network uses only binary parameters. A value of 3 for nnpfc_parameter_type_idc may be reserved for future use and is not present in the bitstream. The decoder must ignore NNPFC SEI messages where the value of nnpfc_parameter_type_idc is 3.

[0257] The values ​​0, 1, 2, and 3 for nnpfc_log2_parameter_bit_length_minus3 indicate that the neural network will not use parameters with bit lengths greater than 8, 16, 32, and 64, respectively. If nnpfc_parameter_type_idc exists and nnpfc_log2_parameter_bit_length_minus3 does not exist, the neural network does not need to use parameters with bit lengths greater than 1.

[0258] nnpfc_num_parameters_idc can indicate the maximum number of neural network parameters for NNPF in units of 2048. A value of 0 for nnpfc_num_parameters_idc indicates that the maximum number of neural network parameters is unknown. The value of nnpfc_num_parameters_idc must be in the range of 0 to 52. Values ​​of nnpfc_num_parameters_idc greater than 52 do not exist in the bitstream. The decoder must ignore NNPFC SEI messages with nnpfc_num_parameters_idc greater than 52.

[0259] If the value of nnpfc_num_parameters_idc is greater than 0, the variable maxNumParameters may be derived as shown in equation 9.

[0260]

number

[0261] A value of nnpfc_num_kmac_operations_idc greater than 0 can indicate that the maximum number of multiply-accumulate operations per sample for NNPF is less than or equal to nnpfc_num_kmac_operations_idc * 1000. A value of nnpfc_num_kmac_operations_idc of 0 can indicate that the maximum number of multiply-accumulate operations for the network is unknown. The value of nnpfc_num_kmac_operations_idc is between 0 and 2. 32 It must exist within the range of -2.

[0262] A nnpfc_total_kilobyte_size greater than 0 can indicate the total size (in kilobytes) required to store the uncompressed parameters of a neural network. The total size in bits may be a number greater than or equal to the sum of the bits used to store each parameter. The nnpfc_total_kilobyte_size may be the rounded value of dividing the total size (in bits) by 8000. A value of 0 for nnpfc_total_kilobyte_size can indicate that the overall size required to store the parameters for the neural network is unknown. The value of nnpfc_total_kilobyte_size must be in the range of 0 to 2 32 -2.

[0263] A value of 0 for nnpfc_metadata_extension_num_bits can indicate that there is no nnpfc_reserved_metadata_extension. A nnpfc_metadata_extension_num_bits greater than 0 can indicate the length (in bits) of the nnpfc_reserved_metadata_extension. The nnpfc_metadata_extension_num_bits must not be the same as 0. Values within the range of 1 to 2048 for nnpfc_metadata_extension_num_bits are reserved for future use and do not exist in the bitstream. The decoder can accept any value of nnpfc_metadata_extension_num_bits in the range of 0 to 2048. Values of nnpfc_metadata_extension_num_bits greater than 2048 do not exist in the bitstream and are not reserved for future use.

[0264] nnpfc_reserved_metadata_extension does not exist in the bitstream. However, the decoder must ignore the existence and value of nnpfc_reserved_metadata_extension. If nnpfc_reserved_metadata_extension exists, its length may be the same as nnpfc_metadata_extension_num_bits.

[0265] nnpfc_reserved_zero_bit_b must be equivalent to 0 in the bitstream. The decoder must ignore NNPFC SEI messages where nnpfc_reserved_zero_bit_b is not 0.

[0266] nnpfc_payload_byte[i] may contain the i-th byte of the bitstream. The byte sequence nnpfc_payload_byte[i] for all existing values ​​of i must be a complete bitstream compliant with ISO / IEC 15938-17.

[0267] Neural network post-filter activation (NNFPA)

[0268] Table 20 shows the syntax structure for NNFPA.

[0269] [Table 20]

[0270] The NNPFA syntax structure in Table 20 may be signaled in the form of an SEI message. An SEI message that signals the NNPFA syntax structure in Table 20 may be called an NNPFA SEI message.

[0271] The NNPFA SEI message can activate or deactivate the possible use of a target neural network post-processing filter (NNPF) identified by nnpfa_target_id for post-processing filtering of a picture set. For a specific picture in which the NNPF has been activated, the target NNPF may be induced as follows:

[0272] - If nnpfa_target_base_flag is 1, the target NNPF may be a base NNPF with the same nnpfc_id as nnpfa_target_id.

[0273] - Otherwise (if nnpfa_target_base_flag is 0), the target NNPF may be the NNPF identified by the last NNPFC SEI message having the same nnpfc_id as nnpfa_target_id. Here, the last NNPFC SEI message must precede the first VCL NAL unit of the current picture in the decoding order and must not be an iteration of an NNPFC SEI message containing the base NNPF.

[0274] Multiple NNPFA SEI messages may exist for the same picture if NNPF is used for other purposes or to filter out other color components.

[0275] nnpfa_target_id is currently associated with a picture and can indicate a target NNPF specified by one or more NNPFC SEI messages that have the same nnpfc_id as nnfpa_target_id. The value of nnpfa_target_id must be in the range of 0 to 232-2.

[0276] An NNPFA SEI message with a specific value for nnpfa_target_id should not currently exist in the PU unless one or both of the following conditions are true:

[0277] - Currently within CLVS, there exists an NNPFC SEI message with the same nnpfc_id as a specific value of nnpfa_target_id that exists within the PU that has priority over the current PU in the decoding order.

[0278] - Currently, there exists an NNPFC SEI message with an nnpfc_id that is identical to the nnpfa_target_id of a specific value in the PU.

[0279] If a PU contains all NNPFC SEI messages with a specific value for nnpfc_id and all NNPFA SEI messages with the same nnpfa_target_id as a specific value for nnpfc_id, then the NNPFC SEI messages must precede the NNPFA SEI messages in the decoding order.

[0280] A value of 1 for nnpfa_cancel_flag can indicate that the persistence of the target NNPF set by any previous NNPFA SEI message having the same nnpfa_target_id as the current SEI message is canceled. That is, the target NNPF will not be used any further unless it is activated by another NNPFA SEI message having the same nnpfa_target_id and the same nnpfa_cancel_flag (0) as the current SEI message. A value of 0 for nnpfa_cancel_flag can indicate that nnpfa_target_base_flag, nnpfa_persistence_flag, and nnpfa_num_output_entries will follow.

[0281] A value of 1 for nnpfa_target_base_flag indicates that the target NNPF is a base NNPF with the same nnpfc_id as nnpfa_target_id. A value of 0 for nnpfa_target_base_flag indicates that the target NNPF is an NNPF identified by the last NNPFC SEI message with the same nnpfc_id as nnpfa_target_id. Here, the last NNPFC SEI message precedes the first VCL NAL unit of the current picture in the decoding order and does not need to be an iteration of NNPFC SEI messages containing the base NNPF.

[0282] The nnpfa_persistence_flag can indicate the persistence of the target NNPF for the current layer. A value of nnpfa_persistence_flag of 0 indicates that the target NNPF may only be used for post-processing filtering on the current picture. A value of nnpfa_persistence_flag of 1 indicates that the target NNPF may be used for post-processing filtering on the current picture and all subsequent pictures in the current layer in the output order until one or more of the following conditions are true.

[0283] - A new CLVS for the current layer is started.

[0284] - Bitstream ends

[0285] - The picture in the current layer associated with the NNPFA SEI message that has the same nnpfa_target_id and nnpfa_cancel_flag as the current SEI message will be output after the current picture in the output order.

[0286] The target NNPF does not apply to subsequent pictures in the current layer associated with an NNPFA SEI message that has the same nnpfa_target_id and nnpfa_cancel_flag as the current SEI message.

[0287] nnpfcTargetPictures may be the set of pictures associated with the last NNPFC SEI message that currently precedes the NNPFA SEI message in the decoding order and has the same nnpfc_id as nnpfa_target_id. nnpfaTargetPictures may be the set of pictures whose target NNPF is currently activated by the NNPFA SEI message. All arbitrary pictures included in nnpfaTargetPictures should also be included in nnpfcTargetPictures.

[0288] nnpfa_num_output_entries can indicate the number of nnpfa_output_flag[i] syntax elements in an NNPFA SEI message. The value of nnpfa_output_flag[i] must be within the range of 0 to NumInpPicsInOutputTensor.

[0289] A value of 1 for nnpfa_output_flag[i] indicates that the NNPF-generated picture corresponding to the input picture with index InpIdx[i] is output by the NNPF process activated by the NNPFA SEI message. Here, the NNPF process may be identified by the semantics of the NNPFA SEI message. A value of 0 for nnpfa_output_flag[i] indicates that the NNPF-generated picture corresponding to the input picture with index InpIdx[i] is not output by the NNPF process activated by the NNPFA SEI message. When nnpfa_num_output_entries is less than NumInpPicsInOutputTensor, nnpfa_output_flag[i] may be inferred to be 1 for each value of i within the range nnpfa_num_output_entries ~ NumInpPicsInOutputTensor - 1.

[0290] Post-filter hint

[0291] Table 21 shows the syntax structure for post-filter hints.

[0292] [Table 21]

[0293] The post-filter hint syntax structures in Table 21 may be signaled in the form of SEI messages. SEI messages that signal the post-filter hint syntax structures in Table 21 can be called post-filter hint SEI messages.

[0294] Post-filter hint SEI messages can provide post-filter coefficients or correlation information for post-filter design, potentially allowing the decoded and output picture set to be used for post-processing to obtain improved display quality.

[0295] A value of 1 for `filter_hint_cancel_flag` indicates that the persistence of a previous post-filter hint SEI message is canceled in the output order in which the SEI message is applied to the current layer. A value of 0 for `filter_hint_cancel_flag` indicates that post-filter hint information follows.

[0296] The `filter_hint_persistence_flag` can indicate the persistence of a post-filter hint SEI message for the current layer. A value of 0 for `filter_hint_persistence_flag` indicates that the post-filter hint applies only to the currently decoded picture. A value of 1 for `filter_hint_persistence_flag` indicates that the post-filter hint SEI message applies to the currently decoded picture and persists to all subsequent pictures in the current layer by output order until one or more of the following conditions are true:

[0297] - A new CLVS for the current layer is started.

[0298] - Bitstream ends

[0299] - Post-filter hints: Pictures in the current layer of the AU associated with the SEI message will be output after the current picture in the output order.

[0300] `filter_hint_size_y` can represent the filter coefficient or the vertical size of the correlation array. The value of `filter_hint_size_y` must be in the range of 1 to 15.

[0301] `filter_hint_size_x` can represent the filter coefficient or the horizontal size of the correlation array. The value of `filter_hint_size_x` must be in the range of 1 to 15.

[0302] `filter_hint_type` can indicate the type of filter hint transmitted, as shown in Table 22. The value of `filter_hint_type` must be in the range of 0 to 2. A `filter_hint_type` value of 3 does not exist in the bitstream. The decoder must ignore post-filter hint SEI messages where `filter_hint_type` is 3.

[0303] [Table 22]

[0304] A value of 1 for filter_hint_chroma_coeff_present_flag indicates that a filter coefficient exists for the chroma. A value of 0 for filter_hint_chroma_coeff_present_flag indicates that no filter coefficient exists for the chroma.

[0305] `filter_hint_value[cIdx][cy][cx]` can represent the filter coefficients, or the cross-correlation matrix elements between the original signal and the decoded signal, with 16-bit precision. The value of `filter_hint_value[cIdx][cy][cx]` is -2 31 +1~2 31 It must be within the range of -1. cIdx may indicate the associated color element, cy may indicate the vertical counter, and cx may indicate the horizontal counter. Depending on the value of filter_hint_type, the following may be applied:

[0306] - If the value of filter_hint_type is 0, the coefficients of a 2D FIR (Finite Impulse Response) filter of size filter_hint_size_y * filter_hint_size_x may be transmitted.

[0307] - On the other hand, if the value of filter_hint_type is 1, the filter coefficients of two one-dimensional FIR filters may be transmitted. In this case, the value of filter_hint_size_y must be 2. An index cy of 0 can indicate the filter coefficient of a horizontal filter, and a cy of 1 can indicate the filter coefficient of a vertical filter. In the filtering process, the horizontal filter may be applied first, and the result may be filtered by the vertical filter.

[0308] - Otherwise (if the value of filter_hint_type is 2), the transmitted hint can indicate the cross-correlation matrix between the original signal s and the decoded signal s'.

[0309] The normalized cross-correlation matrix for the relevant color component identified by cIdx of size filter_hint_size_y * filter_hint_size_x may be defined as in Equation 10.

[0310]

Equation

[0311] In Equation 10, s represents the sample array of the color component cIdx of the original picture, s’ represents the corresponding decoded picture array, h represents the vertical height of the relevant color component, w represents the horizontal width of the relevant color component, bitDepth represents the bit depth of the color component. Also, OffsetY is the same as (filter_hint_size_y >> 1), OffsetX is the same as (filter_hint_size_x >> 1), the range of cy is 0 <= cy < filter_hint_size_y, and the range of cx is 0 <= cx < filter_hint_size_x.

[0312] The decoder can derive the Wiener post-filter from the cross-correlation matrix between the original signal and the decoded signal and the auto-cross-correlation matrix of the decoded signal.

[0313] Problems with conventional technology

[0314] The current design for NNPF-related messages (e.g., NNPFC (Neural-network post-filter characteristics) SEI (Supplemental Enhancement Information) and / or NNPFA (Neural-network post-filter activation) SEI) allows for the use of multiple input pictures in an activated NNPF, but one or more input pictures may be unavailable (e.g., an input picture is not present in the bitstream). For example, one or more input pictures being unavailable may include cases where the NNPF is activated at the beginning of the bitstream. On the other hand, if one or more input pictures are unavailable, a flag may be provided indicating whether to substitute the unavailable (e.g., missing) picture with a duplicate of the last available input picture or a picture with zero-value pixels. In this case, the flag in question is a flag signaled in the NNPFC SEI message and may be nnpfc_absent_input_pic_zero_flag. In addition, information may be signaled for each input picture to NNPF regarding whether the filtering process using NNPF will output an output picture associated with the input picture. Signaling of an output picture associated with an input picture may be included in the NNPFC SEI message, but such signaling may be updated by signaling in the NNPFA SEI message associated with the NNPFC SEI message, and in some cases the output of the picture may be canceled.

[0315] On the other hand, in this case, problems may arise in relation to output pictures associated with unavailable input pictures. While it is preferable that there are no output pictures associated with unavailable input pictures, the current design for NNPFC and NNPFA SEI messages does not clearly demonstrate a way to prevent this.

[0316] Summary of Examples

[0317] This disclosure proposes various embodiments that can solve the problems of the conventional designs described above. The embodiments described below may be used independently, or they may be used in combination with each other or with other embodiments, and these may also be included in this disclosure.

[0318] 1. As one embodiment, an NNPFA SEI message can activate an NNPFC SEI message for a target NNPF (i.e., a target NNPF) that can take multiple input pictures as input, but one or more input pictures are unavailable (e.g., not present in the bitstream), and there is an associated output picture for an unavailable input picture (e.g., the value of nnpfc_input_pic_output_flag[i] for an input picture is 1), then for each of the unavailable input pictures that has an associated output picture, there must be an nnpfa_output_flag[i], and the value of the syntax must be a specific value (e.g., 0).

[0319] 2. Alternatively, an NNPFA SEI message can activate an NNPFC SEI message for a target NNPF (i.e., the target NNPF) that can accept multiple input pictures as input, but one or more input pictures are unavailable (e.g., not present in the bitstream), and there are associated output pictures for the unavailable input pictures (e.g., the value of the input picture's nnpfc_input_pic_output_flag[i] is 1), then the following may apply:

[0320] o The value of a specific syntax (e.g., nnpfa_num_output_entries) must be greater than a specific value (e.g., 0).

[0321] Alternatively, the value of a specific syntax (e.g., nnpfa_num_output_entries) may be restricted to a range of values. For example, the value of a specific syntax must not be less than a certain value (e.g., X+1), where X may be the index value of the last input picture that has the associated output picture (e.g., the picture furthest from the current picture).

[0322] o The value of a specific syntax (e.g., nnpfa_output_flag[i]) for an input video that is not available but has been signaled to have associated output video in NNPFC may be restricted to a specific value (e.g., 0).

[0323] 3. Alternatively, if an NNPFA SEI message activates an NNPFC SEI message for a target NNPF (i.e., the target NNPF) that can take multiple input pictures as input, and one or more input pictures are unavailable (e.g., not present in the bitstream), and there are associated output pictures for the unavailable input pictures (e.g., the value of the input picture's nnpfc_input_pic_output_flag[i] is 1), the NNPF-based filtering process may be restricted from outputting output pictures associated with such input pictures.

[0324] The following describes in more detail various embodiments, including the embodiments described above, and presents improvements to input and output pictures in NNPF (neural-network post-filter) related SEI messages for coded video bitstreams. The embodiments described below were created based on standard video codecs (e.g., VVC (Versatile Video Coding)) and VSEI (Versatile Supplemental Enhancement Information Messages for Coded Video Bitstreams), but it is obvious that they can also be applied to other video coding techniques, and this is also included in the scope of this disclosure.

[0325] On the other hand, when describing the embodiments below, NNPF (Neural-network post-filter) SEI messages or NNPF-related SEI messages may include NNPFC (Neural-network post-filter Characteristic) SEI messages and / or NNPFA (Neural-network post-filter activation) SEI messages, etc.

[0326] On the other hand, the syntax names used when describing the embodiments below are arbitrarily designated for clarity of explanation, so it is obvious that the syntax names can be changed, and even if the syntax names are changed, they can still be considered included in this disclosure.

[0327] The embodiments of this application will be described in detail below with reference to the drawings.

[0328] Example 1

[0329] Example 1 provides a detailed explanation of the example described in Table of Contents 1 above. The VSEI message syntax and semantics are described below.

[0330] As an example, the syntax and semantics of NNPF (Neural-network post-filter) related SEI messages (e.g., NNPFA SEI messages) may be modified as follows. The NNPFA SEI message syntax may be signaled as shown in the following table.

[0331] [Table 23]

[0332] As an example, the information related to the target ID disclosed in the examples in Table 23, namely nnpfa_target_id, nnpfa_cancel_flag, namely nnpfa_target_base_flag, namely nnpfa_persistence_flag, namely nnpfa_output_flag, namely nnpfa_output_flag[i], namely nnpfa_output_flag[i], namely nnpfa_num_output_entries, namely nnpfa_num_output_entries, namely nnpfa_num_output_entries, namely nnpfa_num_output_flag[i], is as described above, so a redundant explanation will be omitted. On the other hand, nnpfa_output_flag[i], which may be included in the output information (e.g., output picture output information) of an output picture corresponding to an input picture (e.g., an NNPF generation picture), may exist for each input picture. However, if the target NNPF (target NNPF) requires multiple input pictures, and one or more input pictures are not present in the bitstream (e.g., unavailable), but there is an associated output picture for the unavailable input picture (e.g., the value of the associated nnpfc_input_pic_output_flag[i] in the NNPFC SEI message is 1), the value of the syntax may be restricted to a specific value. For example, nnpfa_output_flag[i] must exist for each unavailable input picture that has an associated output picture, or the value of the syntax may be restricted to 0.

[0333] Example 2

[0334] Example 2 provides a detailed explanation of the example described in Table of Contents 2 of the overview of the above examples. Below, the VSEI message syntax and semantics are described.

[0335] As an example, Table 23 indicates that a VSEI message may be signaled with syntax such as nnpfa_target_id, and nnpfa_output_flag[i] may be included in the SEI message. On the other hand, nnpfa_output_flag[i], which may be included in information about the output of an output picture corresponding to an input picture (e.g., an NNPF generation picture), may exist for each input picture. However, if a target NNPF (target NNPF) requires multiple input pictures, and one or more input pictures are not present in the bitstream (e.g., unavailable), but there is an associated output picture for the unavailable input picture (e.g., the value of the associated nnpfc_input_pic_output_flag[i] in the NNPFC SEI message is 1), then constraints on nnpfa_output_flag[i] and / or nnpfa_num_output_entries may exist. For example, the value of nnpfa_num_output_entries, which is information related to the number of nnpfa_output_flag[i], may be restricted. As an example, the value of the nnpfa_num_output_entries syntax may be restricted to a value less than a certain value (e.g., X+1), where X is the index of the last input picture in which an associated output picture exists (e.g., the index of the picture furthest from the current picture). Also, although not usable, the value of nnpfa_output_flag for input pictures that contain an associated output picture and are signaled to the NNPFC may be set to a certain value (e.g., 0).

[0336] Example 3

[0337] Example 3 provides a detailed explanation of the example described in Table of Contents 3 above. The VSEI message syntax and semantics are described below.

[0338] For example, information such as nnpfc_num_input_pics_minus1, which relates to the number of pictures input to NNPF; nnpfc_input_pic_output_flag[i], which relates to whether or not output pictures are generated (e.g., information about the existence of output pictures); and nnpfc_absent_input_pic_zero_flag, which relates to how sample values ​​of input pictures that are not present in the bitstream are represented, may be included in NNPF-related messages and signaled.

[0339] On the other hand, if an NNPFC SEI message is activated by an NNPFA SEI message for a target NNPF that takes multiple input pictures as input, and one or more input pictures are unavailable (e.g., not present in the bitstream), and there are associated output pictures for the unavailable input pictures (e.g., the value of nnpfc_input_pic_output_flag[i] for the input picture is 1), the filtering process must not output any output pictures associated with such input pictures. For example, in this case, the filtering process may be an NNPF-based filtering process, and it is acceptable for no output pictures to exist, i.e., for no output pictures to be generated.

[0340] According to the embodiments described in this disclosure, in addition to solving the problems of the prior art mentioned above, it is possible to reduce decoder errors and improve coding quality and efficiency by diversifying the syntax and / or semantics of information, clarifying constraints on information, or clarifying the semantics of information.

[0341] Examples of video decoding methods

[0342] The following describes video encoding and decoding methods according to various embodiments of the present invention. The video decoding method in Figure 5 may be performed by the video decoding device 200, and the video encoding method in Figure 6 may be performed by the video encoding device 100. Furthermore, the video decoding and encoding methods in Figures 5 and 6 may be based on the embodiments described above (including embodiments 1 to 3).

[0343] Figure 5 is a diagram illustrating a video decoding method that can be performed by a video decoding device according to one embodiment of the present disclosure.

[0344] First, as an example, the corresponding output picture information for an input picture may be obtained (S510) from NNPF (neural-network post-filter) related SEI (supplemental enhancement information) messages (for example, NNPFC (neural-network post-filter characteristics) messages and / or NNPFA (neural-network post-filter activation) messages). The corresponding output picture information for the post-filter may include NNPF-related information. The NNPF-related information may include the information described above with reference to the table, etc.

[0345] Subsequently, based on the acquired (corresponding) output picture information, the (corresponding) output picture for the input picture may be acquired (S520). In this case, if there is no output picture for the input picture (it is not generated), or if there is an output picture for the input picture but the output picture is not output, the output picture does not need to be acquired. On the other hand, if there is an output picture for the input picture and it is determined that the output picture for the input picture will be output, the output picture may be acquired. Here, the output picture information may include output picture output information, which is information indicating whether or not an output picture for the input picture will be output. For example, the value of the output picture output information may be determined based on other information, and there may be constraints for determining the value of the information. For example, the output picture output information may include, for example, nnpfa_output_flag[i]. For example, the value of the output picture output information may be determined based on whether or not the input picture can be used (for example, whether or not the input picture exists in the bitstream). For example, the availability of an input picture may be determined based on input picture existence information, which indicates whether or not the input picture is available. For example, input picture existence information may include inputPresentFlag, and input picture existence information may be derived based on other information or other syntax. For example, the case where an input picture is unavailable may include the case where the input picture does not exist in the bitstream. Also, for example, if an input picture is unavailable (for example, if input picture existence information indicates that the input picture is unavailable), output picture output information may indicate that no output picture is output for that input picture. Also, for example, the number of output picture output information items may be determined to be a value within a specific range. Also, for example, the number of output picture output information items may always be determined to be a value greater than a specific value (for example, 0). In such cases, certain constraints may exist on the number of output picture output information items.For example, the value of nnpfa_num_output_entries, which is information that can indicate the number of output picture output entries, may be determined to be a value within a specific range. For example, the specific range may be determined based on the index value of a specific input picture, for example, the specific input picture may be the picture furthest from the specific picture, i.e., the last input picture. On the other hand, for example, the value of the output picture output information may be determined based on the output picture existence information (i.e., output picture generation information), which is information indicating whether or not an output picture exists for an input picture (i.e., whether or not an output picture is generated). For example, the value of the output picture output information may be determined based on the existence of an output picture and / or the existence of an input picture. For example, the value of the output picture output information may be determined based on the existence of an output picture and / or the existence of an input picture. For example, if an input picture is unavailable (e.g., the input picture does not exist in the bitstream), but an output picture (for the input picture) exists, the output picture does not need to be output. More specifically, based on input picture existence information indicating that the input picture is unavailable (e.g., the input picture does not exist in the bitstream), and output picture existence information (for the input picture) indicating that the output picture exists (e.g., the value of nnpfc_input_pic_output_flag[i] is 1), output picture output information can indicate that the output picture will not be output. For example, the absence of an output picture may be represented by the output picture output information having a specific value (e.g., 0). In this case, for example, the output of that output picture may be excluded in an NNPF-based filtering process. On the other hand, for example, output picture existence information may be signaled in an NNPFC SEI message, and output picture output information may be signaled in an NNPFA SEI message.

[0346] Subsequently, although not shown in the diagram, the picture may be restored based on the (corresponding) output picture information.

[0347] On the other hand, since the video decoding method shown in Figure 5 is one embodiment of the present disclosure, certain steps may be changed, the order of the steps may be changed, or some steps may be added or deleted, and it is clear that such modifications are also included in the present disclosure.

[0348] Examples of video encoding methods

[0349] Figure 6 is a diagram illustrating a video encoding method that can be performed by a video encoding device according to one embodiment of the present disclosure.

[0350] First, post-filter-based output picture information for the input picture may be determined (S610). The post-filter-based output picture information may include NNPF-related information, which may be as described by referring to the table above.

[0351] Subsequently, the output picture information may be signaled (S620) in an NNPF (neural-network post-filter) related SEI (supplemental enhancement information) message. As mentioned above, the NNPF related SEI message may include an NNPFC SEI message and / or an NNPFA SEI message. On the other hand, it is possible that no output picture exists (is not generated) for the input picture, or that an output picture exists (is generated), but the output picture is not output. On the other hand, if an output picture exists for the input picture and it is decided that the output picture will be output, the output picture may be output as is. In this regard, output picture existence information, output picture output information which indicates whether or not an output picture will be output for the input picture, and / or input picture existence information may be included in the output picture information and signaled, but they may also be induced to a specific value by other syntax. As an example, there may be specific constraints when determining the value of the output picture output information. For example, the value of the output picture output information may be determined based on whether the input picture is available (i.e., whether the input picture exists in the bitstream or whether it is encoded). More specifically, the value of the output picture output information may be determined to a specific value (e.g., 0) based on the determination that the input picture does not exist in the bitstream (i.e., it becomes unavailable on the decoder side). Also, for example, whether the input picture exists in the bitstream may be determined based on other information. On the other hand, based on the determination that the input picture does not exist in the bitstream, the generated output picture (the result picture corresponding to the input picture) does not need to be output, and this can be indicated by the output picture output information. For example, the output picture output information may include, for example, nnpfa_output_flag[i]. For example, the case where the input picture is unavailable may include the case where the input picture no longer exists in the bitstream. Also, for example, the number of output picture output information items may be determined to a value within a specific range.Furthermore, as an example, the number of output picture output entries may always be determined to be greater than a specific value (e.g., 0). In such cases, certain constraints may exist on the number of output picture output entries. For example, the value of nnpfa_num_output_entries, which is information that can indicate the number of output picture output entries, may be determined to be a value within a specific range. As an example, the specific range may be determined based on the index value of a specific input picture, but as an example, the specific input picture may be the picture furthest from the specific picture, i.e., the last input picture. On the other hand, as an example, the value of the output picture output information may be determined based on whether or not an output picture exists for an input picture (i.e., whether or not an output picture is generated). More specifically, the value of the output picture output information may be determined to a specific value based on the existence of an output picture for an input picture. As an example, the value of the output picture output information may be determined based on whether or not an output picture exists and / or whether or not an input picture exists. For example, whether an input picture exists in the bitstream may be represented by information regarding the presence or absence of the input picture, and may be represented by inputPresentFlag. For example, in this case, if there is no output picture for the input picture (it is not generated), or if there is an output picture for the input picture but the output picture is not output, the output picture does not need to be obtained. On the other hand, if there is an output picture for the input picture and it is determined that the output picture for the input picture should be output, the output picture may be obtained. Here, the output picture information may include output picture output information, which is information indicating whether or not an output picture for the input picture is output. For example, the value of the output picture output information may be determined based on other information, and there may be constraints for determining the value of the information. For example, the output picture output information may include, for example, nnpfa_output_flag[i]. For example, the value of the output picture output information may be determined based on whether or not the input picture can be used (for example, whether or not the input picture exists in the bitstream).For example, the availability of an input picture may be determined based on input picture existence information, which indicates whether or not the input picture is available. For example, input picture existence information may include inputPresentFlag, and input picture existence information may be derived based on other information or other syntax. For example, if an input picture is unavailable, this may include the case where the input picture does not exist in the bitstream. Also, for example, if an input picture is unavailable (for example, if input picture existence information indicates that the input picture is unavailable, i.e., does not exist in the bitstream), output picture output information may be determined to indicate that no output picture is output for that input picture. Also, for example, the number of output picture output information items may be determined to be a value within a specific range. Also, for example, the number of output picture output information items may always be determined to be a value greater than a specific value (e.g., 0). In such cases, certain constraints may exist on the number of output picture output information items. For example, the value of nnpfa_num_output_entries, which is information that can indicate the number of output picture output entries, may be determined to be a value within a specific range. For example, the specific range may be determined based on the index value of a specific input picture, for example, the specific input picture may be the picture furthest from the specific picture, i.e., the last input picture. On the other hand, for example, the value of the output picture output information may be determined based on whether or not an output picture exists for an input picture (i.e., whether or not an output picture is generated). For example, the value of the output picture output information may be determined based on whether or not an output picture exists and / or whether or not an input picture exists. For example, the value of the output picture output information may be determined based on what value the output picture existence information (i.e., output picture generation information) is determined to show and / or what value the input picture existence information is determined to show. For example, if an input picture is unavailable (e.g., the input picture does not exist in the bitstream), but an output picture (for the input picture) exists, the output picture does not need to be output.More specifically, based on input picture existence information being determined to indicate that the input picture is unavailable (e.g., the input picture is not present in the bitstream or is not encoded into the bitstream), and output picture existence information (for the input picture) being determined to indicate that the output picture exists (e.g., the value of nnpfc_input_pic_output_flag[i] is determined to be 1), output picture output information can indicate that the output picture will not be output. For example, the absence of an output picture may be represented by the output picture output information value being a specific value (e.g., 0). In this case, for example, the output of that output picture may be excluded in an NNPF-based filtering process.

[0352] On the other hand, as an example, information about the existence of an output picture may be signaled by an NNPFC SEI message, and information about the output of an output picture may be signaled by an NNPFA SEI message.

[0353] Although not shown in the diagram, the picture may then be restored based on the output picture information.

[0354] Furthermore, as an example, a computer-readable medium recording a bitstream generated by a video encoding method may be provided, and a method for transmitting a bitstream generated by a video encoding method may also be provided.

[0355] On the other hand, since the video encoding method shown in Figure 6 is one embodiment of the present disclosure, certain steps may be changed, the order of the steps may be changed, or some steps may be added or deleted, and it is clear that such modifications are also included in the present disclosure.

[0356] According to this invention, the meaning of information that may be included in VSEI can be clarified, thereby reducing decoder errors and enabling the representation of more accurate scenarios, and thereby improving coding quality. Furthermore, according to this invention, clear handling is possible for cases where a corresponding output picture is generated but the input picture does not exist, thereby improving coding efficiency.

[0357] Figure 7 illustrates a content streaming system to which the embodiments of this disclosure can be applied.

[0358] As shown in Figure 7, a content streaming system to which an embodiment of the present disclosure is applied may broadly include an encoding server, a streaming server, a web server, media storage, user equipment, and multimedia input devices.

[0359] The encoding server is responsible for compressing content input from multimedia input devices such as smartphones, cameras, and camcorders into digital data to generate a bitstream, and transmitting this bitstream to the streaming server. As another example, if a multimedia input device such as a smartphone, camera, or camcorder directly generates the bitstream, the encoding server may be omitted.

[0360] The bitstream may be generated by a video encoding method and / or video encoding apparatus to which an embodiment of the present disclosure is applied, and the streaming server may temporarily store the bitstream in the process of transmitting or receiving the bitstream.

[0361] The streaming server transmits multimedia data to user devices based on user requests via a web server, and the web server can act as an intermediary to inform users of available services. When a user requests a desired service from the web server, the web server transmits it to the streaming server, and the streaming server can transmit multimedia data to the user. In this case, the content streaming system may include a separate control server, in which case the control server can play a role in controlling commands and responses between the devices within the content streaming system.

[0362] The streaming server can receive content from media storage and / or encoding servers. For example, when receiving content from the encoding server, the content can be received in real time. In this case, in order to provide a smooth streaming service, the streaming server can store the bitstream for a certain period of time.

[0363] Examples of user devices include mobile phones, smartphones, laptop computers, digital broadcasting terminals, PDAs (personal digital assistants), PMPs (portable multimedia players), navigation systems, slate PCs, tablet PCs, ultrabooks, wearable devices (such as smartwatches, smart glasses, and HMDs), digital TVs, desktop computers, and digital signage.

[0364] Each server within the aforementioned content streaming system may be operated as a distributed server, in which case the data received by each server may be processed in a distributed manner.

[0365] The scope of this disclosure includes software or machine-executable instructions (e.g., operating systems, applications, firmware, programs, etc.) that enable the operation of various embodiments to be performed on a device or computer, and non-transitory computer-readable medium on which such software or instructions are stored and executable on a device or computer.

[0366] [Industrial applicability] The embodiments described herein can be used for encoding / decoding video.

[0367] [Claims when filing an international application] [Claim 1] A video decoding method performed by a video decoding device, The process involves obtaining post-filter-based output picture information for an input picture from an NNPF (neural-network post-filter) associated with an SEI (supplemental enhancement information) message; The process includes the step of obtaining an output picture for the input picture based on the output picture information; A video decoding method, wherein the output picture information includes output picture output information, which is information indicating whether or not an output picture is output for the input picture. [Claim 2] The video decoding method according to claim 1, wherein the value of the output picture output information is determined based on whether or not the input picture exists. [Claim 3] The video decoding method according to claim 2, wherein the value of the output picture output information is determined based on output picture existence information, which is information indicating whether or not the output picture exists for the input picture. [Claim 4] The video decoding method according to claim 3, wherein, based on the absence of the input picture and the existence information of the output picture indicating the existence of the output picture, the output picture output information indicates that the output picture is not output. [Claim 5] The video decoding method according to claim 4, wherein the number of output picture output information items is determined to be a value within a specific range. [Claim 6] The video decoding method according to claim 5, wherein the specific range is determined based on the index value of a specific input picture. [Claim 7] The video decoding method according to claim 4, wherein the number of output picture output information items is always determined to be greater than a specific value. [Claim 8] The video decoding method according to claim 4, wherein in the filtering process based on NNPF, the output of the output picture for the non-existent input picture is excluded. [Claim 9] The video decoding method according to claim 3, wherein the output picture existence information is obtained from an NNPFC SEI message. [Claim 10] The video decoding method according to claim 1, wherein the output picture output information is obtained from an NNPFA SEI message. [Claim 11] A video encoding method performed by a video encoding device, A step in determining post-filter-based output picture information for the input picture; The process includes the step of signaling the output picture information as an NNPF (neural-network post-filter) associated with the SEI (supplemental enhancement information) message; A video encoding method wherein the output picture information includes output picture output information, which is information indicating whether or not an output picture is output for the input picture. [Claim 12] A computer-readable medium recording a bitstream generated by the video encoding method described in claim 11. [Claim 13] A method for transmitting a bitstream generated by a video encoding method, The aforementioned video encoding method is A step in determining post-filter-based output picture information for the input picture; The process includes the step of signaling the output picture information as an NNPF (neural-network post-filter) associated with the SEI (supplemental enhancement information) message; A method wherein the output picture information includes output picture output information, which is information indicating whether or not an output picture is output for the input picture.

Claims

1. A video decoding method performed by a video decoding device, The steps include: obtaining post-filter-based output picture information for an input picture from an NNPF (Neural-Network Post-Filter) associated with an SEI (Supplemental Enhancement Information) message; The process includes: a step of obtaining an output picture for the input picture based on the output picture information; A video decoding method, wherein the output picture information includes output picture output information, which is information indicating whether or not an output picture is output for the input picture.

2. The video decoding method according to claim 1, wherein the value of the output picture output information is determined based on whether or not the input picture exists.

3. The video decoding method according to claim 2, wherein the value of the output picture output information is determined based on output picture existence information, which is information indicating whether or not the output picture exists for the input picture.

4. The video decoding method according to claim 3, wherein, based on the absence of the input picture and the existence information of the output picture indicating the existence of the output picture, the output picture output information indicates that the output picture is not output.

5. The video decoding method according to claim 4, wherein the number of output picture output information items is determined to be a value within a specific range.

6. The video decoding method according to claim 5, wherein the specific range is determined based on the index value of a specific input picture.

7. The video decoding method according to claim 4, wherein the number of output picture output information items is always determined to be greater than a specific value.

8. The video decoding method according to claim 4, wherein in the filtering process based on the NNPF, the output of the output picture for the non-existent input picture is excluded.

9. The video decoding method according to claim 3, wherein the output picture existence information is obtained from an NNPFC SEI message.

10. The video decoding method according to claim 1, wherein the output picture output information is obtained from an NNPFA SEI message.

11. A video encoding method performed by a video encoding device, The stage of determining post-filter-based output picture information for the input picture; The process includes: a step of signaling the output picture information as an NNPF (neutral-network post-filter) associated with an SEI (supplemental enhancement information) message; A video encoding method wherein the output picture information includes output picture output information, which is information indicating whether or not an output picture is output for the input picture.

12. A computer-readable medium recording a bitstream generated by the video encoding method described in claim 11.

13. A method for transmitting a bitstream generated by a video encoding method, The aforementioned video encoding method is The stage of determining post-filter-based output picture information for the input picture; The process includes: a step of signaling the output picture information as an NNPF (neutral-network post-filter) associated with an SEI (supplemental enhancement information) message; A method wherein the output picture information includes output picture output information, which is information indicating whether or not an output picture is output for the input picture.