Methods for context-based video coding
By selecting probabilistic parameters for context model initialization in CABAC based on coding conditions, the method enhances coding efficiency and compression performance for video coding standards.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- ALIBABA (CHINA) CO LTD
- Filing Date
- 2024-04-12
- Publication Date
- 2026-06-19
AI Technical Summary
Existing video coding standards face challenges in efficiently initializing context model probabilities for slices in context-based adaptive binary arithmetic coding (CABAC), which affects coding efficiency and compression performance.
A method for initializing context model probabilities for slices in CABAC by selecting a set of probabilistic parameters from predefined sets based on coding conditions or bitstream signals, enabling entropy decoding or encoding of B-slices.
Improves coding efficiency and compression performance by optimizing the initialization of context model probabilities, aligning with the goals of advanced video coding standards like VVC/H.266.
Smart Images

Figure 2026519905000001_ABST
Abstract
Description
Technical Field
[0001] (Cross - reference to Related Applications) This application claims priority based on U.S. Provisional Application No. 63 / 496,012, filed on April 13, 2023; U.S. Provisional Application No. 63 / 618,884, filed on January 8, 2024; and U.S. Application No. 18 / 628,723, filed on April 6, 2024, and the entire contents of all of them are incorporated herein by reference in their entirety.
[0002] This disclosure generally relates to video processing, and more specifically, to methods and apparatuses for initializing a set of context model probabilities for slices in context - based adaptive binary arithmetic coding (CABAC).
Background Art
[0003] Video is a set of static pictures (or "frames") that capture visual information. In order to reduce memory and transmission bandwidth, video can be compressed before storage or transmission and decompressed before display. The compression process is usually called encoding, and the decompression process is usually called decoding. There are various video coding formats that use standardized video coding techniques, most commonly based on prediction, transformation, quantization, entropy coding, and in - loop filtering. Video coding standards such as the High - Efficiency Video Coding (HEVC / H.265) standard, the Versatile Video Coding (VVC / H.266) standard, and the AVS standard, which define specific video coding formats, are developed by standardization organizations. As more advanced video coding techniques are adopted in video standards, the coding efficiency of new video coding standards becomes higher and higher.
Summary of the Invention
[0004] Embodiments of this disclosure provide a method and apparatus for initializing a context model probability set of a slice in context-based adaptive binary arithmetic coding (CABAC).
[0005] Several exemplary embodiments provide a decoding method which includes the steps of: selecting a first set of probabilistic parameters from a plurality of predefined sets of probabilistic parameters to initialize one or more context models of a B-slice; and performing entropy decoding of the B-slice based on the one or more context models and the first set of probabilistic parameters, wherein the step of selection is based on coding conditions of the B-slice or signals in the bitstream.
[0006] Several exemplary embodiments provide an encoding method which includes the steps of: selecting a first set of probabilistic parameters from a plurality of predefined sets of probabilistic parameters for initializing one or more context models of a B-slice; and performing entropy encoding of the B-slice based on the one or more context models and the first set of probabilistic parameters, wherein the step of selection is based on coding conditions for the B-slice or signals in the bitstream.
[0007] Several exemplary embodiments provide a non-temporary, computer-readable storage medium for storing a bitstream of video. The stream includes the steps of: selecting a first set of probabilistic parameters from a plurality of predefined sets of probabilistic parameters to initialize one or more context models of a B-slice; and performing entropy coding or decoding of the B-slice based on the one or more context models and the first set of probabilistic parameters, wherein the selection step is based on coding conditions for the B-slice or signals in the bitstream. [Brief explanation of the drawing]
[0008] Embodiments and various aspects of this disclosure are shown in the following detailed description and accompanying drawings. The various features shown in the drawings are not drawn to scale.
[0009] [Figure 1] This is a schematic diagram showing the structure of an exemplary video sequence according to some embodiments of the present disclosure.
[0010] [Figure 2A] This is a schematic diagram illustrating an exemplary coding process of a hybrid video coding system according to some embodiments of the present disclosure.
[0011] [Figure 2B] This is a schematic diagram illustrating another exemplary coding process for a hybrid video coding system according to some embodiments of the present disclosure.
[0012] [Figure 3A] This is a schematic diagram illustrating an exemplary decoding process of a hybrid video coding system according to some embodiments of the present disclosure.
[0013] [Figure 3B] This is a schematic diagram illustrating another exemplary decoding process for a hybrid video coding system according to some embodiments of the present disclosure.
[0014] [Figure 4] This is a block diagram of an exemplary apparatus for encoding or decoding video, according to some embodiments of the present disclosure.
[0015] [Figure 5] This is a schematic diagram illustrating a context-based adaptive binary arithmetic coding (CABAC) engine according to some embodiments of the present disclosure.
[0016] [Figure 6]Shows an exemplary table of codewords used for binary coding according to some embodiments of the present disclosure.
[0017] [Figure 7] Shows an exemplary process for updating the Range variable and the Low variable in the Binary Arithmetic Encoding (BAE) stage of the CABAC engine of FIG. 5 according to some embodiments of the present disclosure.
[0018] [Figure 8] Shows an exemplary process for updating the Range variable and the Low variable in the BAE stage of the CABAC engine of FIG. 5 according to some embodiments of the present disclosure.
[0019] [Figure 9] Shows an exemplary set of context model probability parameters according to some embodiments of the present disclosure.
[0020] [Figure 10] Shows another exemplary set of context model probability parameters according to some embodiments of the present disclosure.
[0021] [Figure 11] Shows four exemplary sets of context model probability parameters according to some embodiments of the present disclosure.
[0022] [Figure 12] Shows a flowchart of an exemplary method for decoding a bitstream associated with a video according to some embodiments of the present disclosure.
[0023] [Figure 13] Shows a flowchart of another exemplary method for decoding a bitstream associated with a video according to some embodiments of the present disclosure.
[0024] [Figure 14]A flowchart of an exemplary method for encoding a bitstream associated with video, according to some embodiments of this disclosure, is shown.
[0025] [Figure 15] A flowchart of another exemplary method for encoding a bitstream associated with video, according to some embodiments of this disclosure, is shown. [Modes for carrying out the invention]
[0026] The following description refers in detail to exemplary embodiments, which are shown in the accompanying drawings. The following description refers in detail to the accompanying drawings, and unless otherwise noted, the same numbers in different drawings represent identical or similar elements. The embodiments described below in the description of exemplary embodiments do not represent all embodiments of the present invention. Rather, they are merely examples of apparatus and methods according to the embodiments of the invention described in the claims. Specific embodiments of this disclosure will be described in more detail below. In the event of any conflict between terms or definitions incorporated by reference and those provided herein, the terms and definitions provided herein shall prevail.
[0027] This disclosure provides methods for initializing stochastic parameters of a context model for use in context-based adaptive binary arithmetic coding (CABAC). In some disclosed embodiments, the initial stochastic parameters may be selected from a set of predefined stochastic parameters. In some embodiments, the set of predefined stochastic parameters may be pre-stored in the encoder and decoder, and the selection may be derived in both the encoder and decoder without explicit signaling. In some embodiments, the set of predefined stochastic parameters may be pre-stored in the encoder and decoder, and the selection may be signaled to the bitstream. In some embodiments, the initial stochastic parameters may be selected without referring to a previously coded picture. In some embodiments, the initial stochastic parameters may be selected based on the content of the current slice to which CABAC is applied.
[0028] The Joint Video Experts Team (JVET), comprised of the ITU-T Video Coding Expert Group (ITU-T VCEG) and the ISO / IEC Moving Picture Expert Group (ISO / IEC MPEG), is currently developing the Multipurpose Video Coding (VVC / H.266) standard. The VVC standard aims to double the compression efficiency of its predecessor, the High Efficiency Video Coding (HEVC / H.265) standard. In other words, the goal of VVC is to achieve the same subjective quality as HEVC / H.265 with half the bandwidth.
[0029] To achieve the same subjective quality as HEVC / H.265 with half the bandwidth, JVET has developed techniques that go beyond HEVC using Joint Exploration Model (JEM) reference software. As coding techniques were incorporated into JEM, JEM achieved substantially higher coding performance than HEVC.
[0030] The VVC standard is a recent development and continues to incorporate more coding techniques to provide better compression performance. VVC is based on the same hybrid video coding system used in modern video compression standards such as HEVC, H.264 / AVC, MPEG2, and H.263.
[0031] Video is a set of static pictures (or "frames") arranged in a temporal sequence to store visual information. A video capture device (e.g., a camera) can be used to capture and store these pictures in a temporal sequence, and a video playback device (e.g., a television, computer, smartphone, tablet computer, video player, or any end-user terminal with display capabilities) can be used to display these pictures in a temporal sequence. In some applications, a video capture device can also transmit the captured video in real time to a video playback device (e.g., a computer with a monitor) for purposes such as surveillance, conferencing, or live broadcasting.
[0032] To reduce the memory space and transmission bandwidth required for such applications, video can be compressed before storage and transmission, and decompressed before display. Compression and decompression can be implemented by software executed by a processor (e.g., a general-purpose computer processor) or by dedicated hardware. The module for compression is usually called an "encoder," and the module for decompression is usually called a "decoder." Encoders and decoders may be collectively referred to as a "codec." Encoders and decoders can be implemented as any suitable hardware, software, or a combination thereof. For example, a hardware implementation of an encoder and decoder may include circuits such as one or more microprocessors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), discrete logic, or any combination thereof. A software implementation of an encoder and decoder may include program code, computer executable instructions, firmware, or any suitable computer implementation algorithm or process fixed on a computer-readable medium. Video compression and decompression can be implemented using various algorithms or standards such as MPEG-1, MPEG-2, MPEG-4, and the H.26x series. In some applications, a codec can decompress video from a first coding standard and then recompress the decompressed video using a second coding standard; in this case, the codec can be called a "transcoder."
[0033] A video encoding process can identify and retain useful information that can be used to reconstruct a picture, while ignoring information that is not important for reconstruction. If the ignored, non-essential information cannot be fully reconstructed, such an encoding process can be called "lossy." Otherwise, it can be called "lossless." Most encoding processes are lossy, which is a trade-off to reduce the required memory space and transmission bandwidth.
[0034] Useful information in an encoded picture (called the "current picture") includes changes relative to a reference picture (e.g., a pre-encoded and reconstructed picture). Such changes can include changes in pixel position, brightness, or color, of which position changes are the most relevant. Changes in the position of a group of pixels representing an object can reflect the movement of the object between the reference picture and the current picture.
[0035] A picture coded without referencing another picture (i.e., its reference picture is itself) is called an "I-picture". A picture is called a "P-picture" if some or all of its blocks (for example, blocks that generally refer to parts of a video picture) are predicted by one reference picture through intra-prediction or inter-prediction (e.g., single prediction). A picture is called a "B-picture" if at least one block in it is predicted by two reference pictures (e.g., bi-prediction).
[0036] Figure 1 shows the structure of an exemplary video sequence 100 according to some embodiments of the present disclosure. The video sequence 100 may be live video or captured and archived video. The video 100 may be real-life video, computer-generated video (e.g., computer game video), or a combination thereof (e.g., real-life video with augmented reality effects). The video sequence 100 can receive video from a video content provider and receive input from a video capture device (e.g., a camera), a video archive containing pre-captured video (e.g., video files stored in a storage device), or a video feed interface (e.g., a video broadcast transceiver).
[0037] As shown in Figure 1, the video sequence 100 may include a series of pictures arranged in time along a timeline, including pictures 102, 104, 106, and 108. Pictures 102-106 are consecutive, with more pictures between pictures 106 and 108. In Figure 1, picture 102 is an I picture, and its reference picture is picture 102 itself. Picture 104 is a P picture, and its reference picture is picture 102, as indicated by the arrow. Picture 106 is a B picture, and its reference pictures are pictures 104 and 108, as indicated by the arrow. In some embodiments, the reference picture of a picture (e.g., picture 104) may not be immediately before or after the picture. For example, the reference picture of picture 104 may be the picture before picture 102. Note that the reference pictures 102-106 are merely examples, and this disclosure is not limited to the examples of reference picture embodiments shown in Figure 1.
[0038] Typically, video codecs do not encode or decode an entire picture at once, due to the computational complexity of such a task. Rather, they can divide the picture into basic segments and encode or decode each picture segment individually. Such basic segments are referred to in this disclosure as basic processing units ("BPUs"). For example, structure 110 in Figure 1 illustrates an exemplary structure of a picture (e.g., any of pictures 102-108) in video sequence 100. In structure 110, the picture is divided into 4x4 basic processing units, their boundaries indicated by dashed lines. In some embodiments, basic processing units may be referred to as "macroblocks" in some video coding standards (e.g., the MPEG family, H.261, H.263, or H.264 / AVC) or as "coding tree units" ("CTUs") in some other video coding standards (e.g., H.265 / HEVC or H.266 / VVC). The basic processing unit may have pixels of variable size in the picture, for example, 128×128, 64×64, 32×32, 16×16, 4×8, 16×32, or any shape and size. The size and shape of the basic processing unit can be selected for the picture based on a balance between coding efficiency and the level of detail to be retained in the basic processing unit.
[0039] A basic processing unit can be a logic unit that may contain groups of different types of video data stored in computer memory (e.g., a video frame buffer). For example, a basic processing unit for a color picture may include a luminance component (Y) representing achromatic luminance information, one or more chroma components (e.g., Cb, Cr) representing color information, and associated syntax elements, of which the lumina and chroma components may have the same size as the basic processing unit. The lumina and chroma components may be called a "coding tree block" ("CTB") in some video coding standards (e.g., H.265 / HEVC or H.266 / VVC). Any operation performed on a basic processing unit can be repeated for each of its lumina and chroma components.
[0040] Video coding involves multiple operational stages, examples of which are shown in Figures 2A-2B and 3A-3B. In each stage, even the size of the basic processing unit may be too large for processing, and therefore it can be further divided into segments referred to in this disclosure as “basic processing subunits.” In some embodiments, a basic processing subunit may be referred to as a “block” in some video coding standards (e.g., the MPEG family, H.261, H.263, or H.264 / AVC) or as a “coding unit” (“CU”) in some other video coding standards (e.g., H.265 / HEVC or H.266 / VVC). A basic processing subunit may be the same size as or smaller than a basic processing unit. Similar to a basic processing unit, a basic processing subunit is a logic unit that may contain a group of different types of video data (e.g., Y, Cb, Cr, and associated syntax elements) stored in computer memory (e.g., a video frame buffer). Any operation performed on a basic processing subunit can be repeatedly executed on both its luma and chroma components. Furthermore, such divisions can be extended to further levels depending on processing needs. It should also be noted that different schemes can be used to divide the basic processing unit at different stages.
[0041] For example, in the mode determination stage (one example of which is shown in Figure 2B), the encoder can determine which prediction mode (e.g., intra-picture prediction or inter-picture prediction) to use for the base processing unit, which may be too large to make such a decision. The encoder can divide the base processing unit into multiple base processing subunits (e.g., CUs in H.265 / HEVC or H.266 / VVC) and determine the prediction type for each individual base processing subunit.
[0042] As another example, in the prediction stage (an example of which is shown in Figures 2A-2B), the encoder can perform predictive operations at the level of basic processing subunits (e.g., CUs). However, in some cases, even basic processing subunits may be too large for processing. The encoder can further divide the basic processing subunits into smaller segments (e.g., called "predictive blocks" or "PBs" in H.265 / HEVC or H.266 / VVC) at a level in which predictive operations can be performed.
[0043] As another example, in the conversion stage (an example of which is shown in Figures 2A-2B), the encoder can perform conversion operations on residual basic processing subunits (e.g., CUs). However, in some cases, even a basic processing subunit may be too large for processing. The encoder can further divide the basic processing subunit into smaller segments (e.g., called "conversion blocks" or "TBs" in H.265 / HEVC or H.266 / VVC) that are capable of performing conversion operations. Note that the division scheme for the same basic processing subunit may differ between the prediction stage and the conversion stage. For example, in H.265 / HEVC or H.266 / VVC, the prediction blocks and conversion blocks of the same CU may have different sizes and numbers.
[0044] In the structure 110 of Figure 1, the basic processing unit 112 is further divided into 3x3 basic processing subunits, with their boundaries indicated by dotted lines. Different basic processing units of the same picture can be divided into basic processing subunits in different schemes.
[0045] In some embodiments, to provide parallel processing and error tolerance for video encoding and decoding, a picture can be divided into several regions for processing, such that the encoding or decoding process does not depend on information from any other region of the picture. In other words, each region of the picture can be processed independently. In this way, the codec can process different regions of the picture in parallel, thereby improving coding efficiency. Furthermore, if data in a region is corrupted during processing or lost during network transmission, the codec can correctly encode or decode other regions of the same picture without relying on the corrupted or lost data, thereby providing error tolerance. In some video coding standards, a picture can be divided into different types of regions. For example, H.265 / HEVC and H.266 / VVC offer two types of regions: "slices" and "tiles". Note that different pictures in video sequence 100 may have different partitioning schemes for dividing the picture into several regions.
[0046] For example, in Figure 1, structure 110 is divided into three regions 114, 116, and 118, with their boundaries indicated by solid lines within structure 110. Region 114 contains four basic processing units. Regions 116 and 118 each contain six basic processing units. Note that the basic processing units, basic processing subunits, and regions of structure 110 in Figure 1 are merely examples, and this disclosure does not limit its embodiments.
[0047] Figure 2A shows a schematic diagram of an exemplary encoding process 200A according to an embodiment of the present disclosure. For example, the encoding process 200A may be performed by an encoder. As shown in Figure 2A, the encoder can encode a video sequence 202 into a video bitstream 228 by process 200A. Similar to video sequence 100 in Figure 1, video sequence 202 may include a set of pictures arranged in chronological order (referred to as “original pictures”). Similar to structure 110 in Figure 1, each original picture in video sequence 202 may be divided by the encoder into a basic processing unit, basic processing subunit, or region for processing. In some embodiments, the encoder can perform process 200A at the level of a basic processing unit for each original picture in video sequence 202. For example, the encoder can perform process 200A iteratively, in which the encoder can encode a basic processing unit in a single iteration of process 200A. In some embodiments, the encoder can execute process 200A in parallel for each region of the original picture in the video sequence 202 (e.g., regions 114-118).
[0048] In Figure 2A, the encoder can generate predicted data 206 and predicted BPU 208 by supplying the basic processing unit (called the "original BPU") of the original picture of the video sequence 202 to the prediction stage 204. The encoder can generate residual BPU 210 by subtracting predicted BPU 208 from original BPU. The encoder can generate quantization conversion coefficients 216 by supplying residual BPU 210 to the conversion stage 212 and quantization stage 214. The encoder can generate video bitstream 228 by supplying predicted data 206 and quantization conversion coefficients 216 to the binary coding stage 226. Components 202, 204, 206, 208, 210, 212, 214, 216, 226, and 228 can be called the "forward pass". In process 200A, after the quantization stage 214, the encoder can generate a reconstructed residual BPU 222 by supplying the quantization conversion coefficients 216 to the inverse quantization stage 218 and the inverse conversion stage 220. The encoder can generate a prediction reference 224, which will be used for the next iteration of process 200A in the prediction stage 204, by adding the reconstructed residual BPU 222 to the prediction BPU 208. Components 218, 220, 222, and 224 of process 200A can be called the “reconstruction path”. The reconstruction path can be used to ensure that both the encoder and the decoder use the same reference data for prediction.
[0049] The encoder can encode each original BPU of the original picture (in the forward pass) and generate a predictive reference 224 for encoding the next original BPU of the original picture (in the reconstruction pass) by iteratively performing process 200A. After encoding all original BPUs of the original picture, the encoder can proceed to encoding the next picture in the video sequence 202.
[0050] Referring to process 200A, the encoder may receive a video sequence 202 generated by a video capture device (e.g., a camera). As used herein, “receive” can mean receiving, inputting, acquiring, retrieving, obtaining, reading, accessing, or any action in any way for inputting data.
[0051] In prediction stage 204, in the current iteration, the encoder receives the original BPU and prediction reference 224 and can generate prediction data 206 and prediction BPU 208 by performing prediction operations. The prediction reference 224 can be generated from the reconstruction pass of the previous iteration of process 200A. The purpose of prediction stage 204 is to reduce information redundancy by extracting prediction data 206 from the prediction data 206 and prediction reference 224 that can be used to reconstruct the original BPU as prediction BPU 208.
[0052] Ideally, the predicted BPU 208 could be identical to the original BPU. However, because prediction and reconstruction operations are not ideal, the predicted BPU 208 is generally slightly different from the original BPU. To record such differences, the encoder can generate the residual BPU 210 by subtracting the predicted BPU 208 from the original BPU after generating the predicted BPU 208. For example, the encoder can subtract the pixel values (e.g., grayscale values or RGB values) of the predicted BPU 208 from the corresponding pixel values of the original BPU. Each pixel of the residual BPU 210 may have a residual value as a result of such a subtraction between the corresponding pixels of the original BPU and the predicted BPU 208. Compared to the original BPU, the predicted data 206 and residual BPU 210 may have fewer bits, but they can be used to reconstruct the original BPU without causing significant quality degradation. This compresses the original BPU.
[0053] To further compress the residual BPU210, in the transformation stage 212, the encoder can reduce the spatial redundancy of the residual BPU210 by decomposing it into a set of two-dimensional "base patterns," each associated with a "transformation coefficient." The base patterns can have the same size (e.g., the size of the residual BPU210). Each base pattern can represent the fluctuating frequency components (e.g., the frequency of the luminance fluctuations) of the residual BPU210. No base pattern can be reconstructed from any combination of any other base pattern (e.g., a linear combination). In other words, the decomposition can decompose the fluctuations of the residual BPU210 into the frequency domain. Such a decomposition is analogous to the discrete Fourier transform of a function, where the base patterns are analogous to the basis functions of the discrete Fourier transform (e.g., trigonometric functions), and the transformation coefficients are analogous to the coefficients associated with the basis functions.
[0054] Different transformation algorithms can use different base patterns. Various transformation algorithms can be used in transformation stage 212, such as discrete cosine transforms and discrete sine transforms. The transformation in transformation stage 212 is reversible; that is, the encoder can reconstruct the residual BPU 210 by performing the reverse transformation (called the "inverse transformation"). For example, to reconstruct the pixels of the residual BPU 210, the inverse transformation can generate a weighted sum by multiplying the values of the corresponding pixels in the base pattern by the respective coefficients involved and adding the products. Both the encoder and decoder can use the same transformation algorithm (and thus the same base pattern) for the video coding standard. Therefore, the encoder can record only the transformation coefficients, and from these coefficients, the decoder can reconstruct the residual BPU 210 without receiving the base pattern from the encoder. Compared to the residual BPU 210, the transformation coefficients may have fewer bits, but they can be used to reconstruct the residual BPU 210 without significant quality degradation. This further compresses the residual BPU 210.
[0055] The encoder can further compress the conversion coefficients in the quantization stage 214. In the conversion process, different base patterns can represent different fluctuation frequencies (e.g., luminance fluctuation frequencies). Since the human eye is generally better at recognizing low-frequency fluctuations, the encoder can ignore information on high-frequency fluctuations without significantly degrading the decoding quality. For example, in the quantization stage 214, the encoder can generate quantization conversion coefficients 216 by dividing each conversion coefficient by an integer value (called the "quantization scale factor") and rounding the quotient to the nearest integer. After such operation, some conversion coefficients for high-frequency base patterns can be converted to zero, and conversion coefficients for low-frequency base patterns can be converted to smaller integers. The encoder can ignore the zero-value quantization conversion coefficients 216, thereby further compressing the conversion coefficients. The quantization process is also reversible, in which the quantization conversion coefficients 216 can be reconstructed into conversion coefficients in the reverse operation of quantization (called "inverse quantization").
[0056] Because the encoder ignores the remainder of such division in its rounding operation, the quantization stage 214 may be irreversible. Typically, the quantization stage 214 can lead to the greatest information loss in process 200A. The greater the information loss, the fewer bits the quantization conversion coefficient 216 may require. To obtain different levels of information loss, the encoder can use different values for the quantization parameter or any other parameter of the quantization process.
[0057] In the binary coding stage 226, the encoder can encode the prediction data 206 and the quantization transformation coefficients 216 using binary coding techniques such as context-based adaptive binary arithmetic coding (CABAC), entropy coding, variable-length coding, arithmetic coding, Huffman coding, or any other lossless or lossy compression algorithm.
[0058] For example, the CABAC encoding process in binary coding stage 226 may include a binarison step, a context modeling step, and a binary arithmetic coding step. If the syntax element is not binary, the encoder first maps the syntax element to a binary sequence. The encoder may select either a context coding mode or a bypass coding mode for coding. In some embodiments, in context coding mode, a probabilistic model for the binary number (bin) to be encoded is selected by the “context,” which means a previously encoded syntax element. The binary number and the selected context model are then passed to an arithmetic coding engine that encodes the binary number and updates the corresponding probability distribution of the context model. In some embodiments, in bypass coding mode, the binary number is encoded with a fixed probability (e.g., a probability equal to 0.5) without selecting a probabilistic model by the “context.” In some embodiments, the bypass coding mode is selected for a particular binary number to accelerate the entropy coding process with little loss of coding efficiency.
[0059] In some embodiments, the encoder may encode other information in the binary coding stage 226, in addition to the prediction data 206 and quantization conversion coefficients 216, such as the prediction mode used in the prediction stage 204, parameters of the prediction operation, the conversion type in the conversion stage 212, parameters of the quantization process (e.g., quantization parameters), and encoder control parameters (e.g., bitrate control parameters). The encoder can generate a video bitstream 228 using the output data from the binary coding stage 226. In some embodiments, the video bitstream 228 may be further packetized for network transmission.
[0060] Referencing the reconstruction path of process 200A, in the inverse quantization stage 218, the encoder can generate reconstruction transformation coefficients by performing inverse quantization on the quantization transformation coefficients 216. In the inverse transformation stage 220, the encoder can generate reconstruction residual BPU 222 based on the reconstruction transformation coefficients. The encoder can generate a prediction reference 224 to be used in the next iteration of process 200A by adding the reconstruction residual BPU 222 to the prediction BPU 208.
[0061] Furthermore, the video sequence 202 can be encoded using other variations of process 200A. In some embodiments, the stages of process 200A may be executed in a different order by the encoder. In some embodiments, one or more stages of process 200A may be combined into a single stage. In some embodiments, a single stage of process 200A may be divided into multiple stages. For example, the conversion stage 212 and the quantization stage 214 may be combined into a single stage. In some embodiments, process 200A may include additional stages. In some embodiments, process 200A may omit one or more stages in Figure 2A.
[0062] Figure 2B shows a schematic diagram of another exemplary coding process 200B according to an embodiment of the present disclosure. Process 200B may be a modification of process 200A. For example, process 200B can be used by an encoder compliant with a hybrid video coding standard (e.g., the H.26x series). Compared to process 200A, the forward pass of process 200B further includes a mode determination stage 230 and divides the prediction stage 204 into a spatial prediction stage 2042 and a temporal prediction stage 2044. The reconstruction pass of process 200B further includes a loop filter stage 232 and a buffer 234.
[0063] Generally, prediction techniques can be classified into two types: spatial prediction and temporal prediction. Spatial prediction (e.g., intra-picture prediction or "intra-prediction") can predict the current BPU using pixels from one or more already coded adjacent BPUs in the same picture. That is, prediction reference 224 in spatial prediction may include adjacent BPUs. Spatial prediction can reduce the inherent spatial redundancy of a picture. Temporal prediction (e.g., inter-picture prediction or "inter-prediction") can predict the current BPU using regions from one or more already coded pictures. That is, prediction reference 224 in temporal prediction may include coded pictures. Temporal prediction can reduce the inherent temporal redundancy of a picture.
[0064] Referring to process 200B, in the forward pass, the encoder performs prediction operations in the spatial prediction stage 2042 and the temporal prediction stage 2044. For example, in the spatial prediction stage 2042, the encoder can perform intra-prediction. For the original BPU of the encoded picture, the prediction reference 224 may include one or more adjacent BPUs encoded (in the forward pass) and reconstructed (in the reconstruction pass) in the same picture. The encoder can generate the prediction BPU 208 by extrapolating adjacent BPUs. Extrapolation techniques may include, for example, linear extrapolation or interpolation, polynomial extrapolation or interpolation, etc. In some embodiments, the encoder can perform pixel-level extrapolation, such as by extrapolating the corresponding pixel values for each pixel of the prediction BPU 208. The adjacent BPUs used for extrapolation can be positioned relative to the original BPU from various directions, such as vertically (e.g., on the original BPU), horizontally (e.g., to the left of the original BPU), diagonally (e.g., lower left, lower right, upper left, upper right of the original BPU), or from any direction defined by the video coding standard used. For intra-prediction, the prediction data 206 may include, for example, the position (e.g., coordinates) of the adjacent BPUs used, the size of the adjacent BPUs used, the extrapolation parameters, and the orientation of the adjacent BPUs used relative to the original BPU.
[0065] For example, in the temporal prediction stage 2044, the encoder can perform interpretation. For the original BPU of the current picture, the prediction reference 224 may include one or more pictures (referred to as "reference pictures") that have been encoded (in the forward pass) and reconstructed (in the reconstruction pass). In some embodiments, the reference pictures can be encoded and reconstructed for each BPU. For example, the encoder can generate a reconstructed BPU by adding the reconstructed residual BPU 222 to the prediction BPU 208. Once all the reconstructed BPUs for the same picture have been generated, the encoder can generate a reconstructed picture as a reference picture. The encoder can perform a "motion estimation" operation to search for a matching region within a range of the reference picture (referred to as a "search window"). The position of the search window in the reference picture can be determined based on the position of the original BPU in the current picture. For example, the search window can be centered at a position in the reference picture that has the same coordinates as the original BPU in the current picture and can be extended to a predetermined distance. When the encoder identifies a region similar to the original BPU in the search window (e.g., by a Pell recursive algorithm, block matching algorithm, etc.), the encoder can determine such a region as a matching region. The matching region may have different dimensions from the original BPU (e.g., smaller than the original BPU, equal to the original BPU, larger than the original BPU, or a different shape from the original BPU). Because the reference picture and the current picture are temporally separated on the timeline (e.g., as shown in Figure 1), the matching region can be considered to "move" to the original BPU's position over time. The encoder can record the direction and distance of such movement as a "motion vector". If multiple reference pictures are used (e.g., picture 106 in Figure 1), the encoder can search for a matching region and determine its associated motion vector for each reference picture.In some embodiments, the encoder can assign weights to the pixel values of the matching region of each matching reference picture.
[0066] Motion estimation can be used to identify various types of motion, such as translation, rotation, and zoom. For interpretation, the prediction data 206 may include, for example, the position (e.g., coordinates) of the matching region, the motion vector associated with the matching region, the number of reference pictures, and the weights associated with the reference pictures.
[0067] To generate a predicted BPU 208, the encoder can perform a “motion compensation” operation. Motion compensation can be used to reconstruct the predicted BPU 208 based on predicted data 206 (e.g., motion vectors) and predicted references 224. For example, the encoder can move the matching region of a reference picture by motion vectors, within which the encoder can now predict the original BPU of the picture. If multiple reference pictures are used (e.g., picture 106 in Figure 1), the encoder can move the matching region of the reference picture by each motion vector and average pixel value of the matching region. In some embodiments, if the encoder assigns weights to the pixel values of the matching region of each matching reference picture, the weighted sum of the pixel values of the moved matching region can be added.
[0068] In some embodiments, interpretation may be unidirectional or bidirectional. Unidirectional interpretation can use one or more reference pictures that are in the same time direction relative to the current picture. For example, picture 104 in Figure 1 is a unidirectional interpretation picture in which the reference picture (e.g., picture 102) is in front of picture 104. Bidirectional interpretation can use one or more reference pictures that are in both time directions relative to the current picture. For example, picture 106 in Figure 1 is a bidirectional interpretation picture in which the reference pictures (e.g., pictures 104, 108) are in both time directions relative to picture 104.
[0069] Still referring to the forward path of process 200B, after the spatial prediction stage 2042 and the temporal prediction stage 2044, in the mode determination stage 230, the encoder can select a prediction mode (e.g., one of intra-prediction or inter-prediction) for the current iteration of process 200B. For example, the encoder can perform a rate-distortion optimization technique, in which the encoder can minimize the value of the cost function by selecting a prediction mode based on the bitrate of the candidate prediction mode and the distortion of the reconstructed reference picture in the candidate prediction mode. Depending on the selected prediction mode, the encoder can generate the corresponding prediction BPU 208 and prediction data 206.
[0070] In the reconstruction path of process 200B, if intra-prediction mode is selected in the forward path, after generating the predicted reference 224 (e.g., the encoded and reconstructed current BPU in the current picture), the encoder can directly feed the predicted reference 224 to the spatial prediction stage 2042 for later use (e.g., for extrapolation of the next BPU of the current picture). The encoder can also feed the predicted reference 224 to the loop filter stage 232, where the encoder can reduce or eliminate distortions (e.g., blocking artifacts) introduced during the coding of the predicted reference 224 by applying a loop filter to the predicted reference 224. The encoder can apply various loop filtering techniques to the loop filter stage 232, such as deblocking, sample-adaptive offset (SAO), and adaptive loop filtering (ALF). Loop-filtered reference pictures can be stored in buffer 234 (or “Decoded Picture Buffer”) for later use (for example, as interpredictive reference pictures for future pictures in video sequence 202). The encoder may store one or more reference pictures in buffer 234 for use in the temporal prediction stage 2044. In some embodiments, the encoder may encode the loop filter parameters (e.g., loop filter strength) along with the quantization transformation coefficients 216, prediction data 206, and other information in the binary coding stage 226.
[0071] Figure 3A shows a schematic diagram of an exemplary decoding process 300A according to an embodiment of the present disclosure. Process 300A may be a decompression process corresponding to the compression process 200A in Figure 2A. In some embodiments, process 300A may be similar to the reconstruction path of process 200A. Based on process 300A, the decoder can decode the video bitstream 228 into a video stream 304. The video stream 304 may be very similar to the video sequence 202. However, due to information loss in the compression and decompression processes (e.g., the quantization stage 214 in Figures 2A-2B), the video stream 304 is generally not identical to the video sequence 202. Similar to processes 200A and 200B in Figures 2A-2B, the decoder may execute process 300A at the level of the basic processing unit (BPU) for each picture encoded in the video bitstream 228. For example, the decoder can iteratively execute process 300A, in which the decoder can decode the basic processing unit in a single iteration of process 300A. In some embodiments, the decoder can execute process 300A in parallel for each region of the encoded picture (e.g., regions 114-118) in the video bitstream 228.
[0072] In Figure 3A, the decoder can supply a portion of the video bitstream 228 associated with the basic processing unit of the encoded picture (called the "encoded BPU") to the binary decoding stage 302. In the binary decoding stage 302, the decoder can decode a portion of it into prediction data 206 and quantization conversion coefficients 216. The decoder can generate a reconstructed residual BPU 222 by supplying the quantization conversion coefficients 216 to the inverse quantization stage 218 and the inverse transformation stage 220. The decoder can generate a prediction BPU 208 by supplying the prediction data 206 to the prediction stage 204. The decoder can generate a prediction reference 224 by adding the reconstructed residual BPU 222 to the prediction BPU 208. In some embodiments, the prediction reference 224 may be stored in a buffer (e.g., a decoded picture buffer in computer memory). The decoder can supply the prediction reference 224 to the prediction stage 204 to perform prediction operations in the next iteration of process 300A.
[0073] The decoder can decode each of the encoding BPUs of the encoded picture by iteratively performing process 300A and generate a predictive reference 224 for encoding the next encoding BPU of the encoded picture. After decoding all of the encoding BPUs of the encoded picture, the decoder can output the picture to the video stream 304 for display and proceed to decode the next encoded picture in the video bitstream 228.
[0074] In the binary decoding stage 302, the decoder can perform the inverse operation of the binary coding technique used by the encoder (e.g., CABAC, entropy coding, variable-length coding, arithmetic coding, Huffman coding, or any other lossless compression algorithm). In some embodiments, in addition to the prediction data 206 and quantization conversion coefficients 216, the decoder can decode other information in the binary decoding stage 302, such as the prediction mode, prediction operation parameters, conversion type, quantization process parameters (e.g., quantization parameters), and encoder control parameters (e.g., bitrate control parameters). In some embodiments, if the video bitstream 228 is transmitted in packets over the network, the decoder can depackage the video bitstream 228 before supplying it to the binary decoding stage 302.
[0075] Figure 3B shows a schematic diagram of another exemplary decoding process 300B according to an embodiment of the present disclosure. Process 300B may be a modification of process 300A. For example, process 300B can be used with a decoder compliant with a hybrid video coding standard (e.g., the H.26x series). Compared to process 300A, process 300B further divides the prediction stage 204 into a spatial prediction stage 2042 and a temporal prediction stage 2044, and further includes a loop filter stage 232 and a buffer 234.
[0076] In process 300B, the prediction data 206 decoded by the decoder from the binary decoding stage 302 for the encoding base processing unit ("current BPU") of the encoded picture being decoded ("current picture") may include various data depending on the prediction mode used by the encoder to encode the current BPU. For example, if intra-prediction is used by the encoder to encode the current BPU, the prediction data 206 may include a prediction mode indicator (e.g., a flag value) indicating intra-prediction, and parameters of the intra-prediction operation. Parameters of the intra-prediction operation may include, for example, the location (e.g., coordinates) of one or more adjacent BPUs for reference, the size of adjacent BPUs, extrapolation parameters, and the orientation of adjacent BPUs relative to the original BPU. Also, for example, if inter-prediction is used by the encoder to encode the current BPU, the prediction data 206 may include a prediction mode indicator (e.g., a flag value) indicating inter-prediction, and parameters of the inter-prediction operation. Parameters for interpretation operation may include, for example, the number of reference pictures currently associated with the BPU, the weights associated with each reference picture, the location (e.g., coordinates) of one or more matching regions in each reference picture, and one or more motion vectors associated with each matching region.
[0077] Based on the prediction mode indicator, the decoder can decide whether to perform spatial prediction (e.g., intra-prediction) in the spatial prediction stage 2042 or temporal prediction (e.g., inter-prediction) in the temporal prediction stage 2044. Details of performing such spatial or temporal predictions are described in Figure 2B and will not be repeated below. After performing such spatial or temporal predictions, the decoder can generate a prediction BPU 208. The decoder can generate a prediction reference 224 by adding the prediction BPU 208 and the reconstructed residual BPU 222, as shown in Figure 3A.
[0078] In process 300B, the decoder can supply the prediction reference 224 to the spatial prediction stage 2042 or the temporal prediction stage 2044 to perform prediction operations in the next iteration of process 300B. For example, when the current BPU is decoded by intra-prediction in the spatial prediction stage 2042, after generating the prediction reference 224 (e.g., the decoded current BPU), the decoder can supply the prediction reference 224 directly to the spatial prediction stage 2042 for later use (e.g., for extrapolation of the next BPU of the current picture). When the current BPU is decoded by inter-prediction in the temporal prediction stage 2044, after generating the prediction reference 224 (e.g., the reference picture with all BPUs decoded), the decoder can reduce or remove distortion (e.g., blocking artifacts) by supplying the prediction reference 224 to the loop filter stage 232. The decoder can apply a loop filter to the prediction reference 224, as illustrated in Figure 2B. Loop-filtered reference pictures can be stored in buffer 234 (e.g., a decoded picture buffer in computer memory) for later use (e.g., for use as inter-prediction reference pictures for future encoded pictures of the video bitstream 228). The decoder may store one or more reference pictures in buffer 234 for use in the temporal prediction stage 2044. In some embodiments, the prediction data may further include loop filter parameters (e.g., loop filter strength). In some embodiments, the prediction data includes loop filter parameters if the prediction mode indicator of the prediction data 206 indicates that inter-prediction was used to encode the BPU at the present time.
[0079] Figure 4 is a block diagram of an exemplary apparatus 400 for encoding or decoding video according to some embodiments of the present disclosure. As shown in Figure 4, the apparatus 400 may include a processor 402. When the processor 402 executes instructions described herein, the apparatus 400 can become a dedicated machine for video encoding or decoding. The processor 402 may be any type of circuit capable of manipulating or processing information. For example, the processor 402 may include any number of central processing units (or "CPUs"), graphics processing units (or "GPUs"), neural processing units ("NPUs"), microcontroller units ("MCUs"), optical processors, programmable logic controllers, microcontrollers, microprocessors, digital signal processors, intellectual property (IP) cores, programmable logic arrays (PLAs), programmable array logic (PALs), generic array logic (GALs), composite programmable logic devices (CPLDs), field-programmable gate arrays (FPGAs), systems on a chip (SoCs), or application-specific integrated circuits (ASICs), and any combination thereof. In some embodiments, the processor 402 may be a set of processors grouped as a single logical component. For example, as shown in Figure 4, the processor 402 may include a plurality of processors, including processor 402a, processor 402b, and processor 402n.
[0080] The device 400 may include a memory 404 configured to store data (e.g., a set of instructions, computer code, intermediate data, etc.). For example, as shown in Figure 4, the stored data may include program instructions (e.g., program instructions for implementing stages in processes 200A, 200B, 300A, or 300B) and processing data (e.g., video sequence 202, video bitstream 228, or video stream 304). The processor 402 can access the program instructions and processing data (e.g., via the bus 410) and perform operations or manipulations on the processing data by executing the program instructions. The memory 404 may include a high-speed random-access storage device or a non-volatile storage device. In some embodiments, the memory 404 may include any combination of any number of random-access memories (RAM), read-only memories (ROM), optical disks, magnetic disks, hard drives, solid-state drives, flash drives, security digital (SD) cards, memory sticks, or compact flash (CF) cards. Memory 404 may be a group of memories grouped as a single logical component (not shown in Figure 4).
[0081] Bus 410 may be a communication device that transfers data between components within the device 400, such as an internal bus (e.g., a CPU memory bus) or an external bus (e.g., a universal serial bus port, a peripheral component interconnection express port).
[0082] In this disclosure, for the sake of ease of explanation without causing ambiguity, the processor 402 and other data processing circuits are collectively referred to as the "data processing circuits." The data processing circuits may be implemented entirely as hardware, or as a combination of software, hardware, or firmware. The data processing circuits may also be a single, independent module, or they may be combined in whole or in part with any other component of the device 400.
[0083] The device 400 may further include a network interface 406 to provide wired or wireless communication to and from a network (e.g., the Internet, an intranet, a local area network, a mobile communication network, etc.). In some embodiments, the network interface 406 may include any number of network interface controllers (NICs), radio frequency (RF) modules, transponders, transceivers, modems, routers, gateways, wired network adapters, wireless network adapters, Bluetooth adapters, infrared adapters, near-field communication ("NFC") adapters, or any combination of such components.
[0084] In some embodiments, the apparatus 400 may optionally further include a peripheral interface 408 to provide connectivity with one or more peripheral devices. As shown in Figure 4, peripheral devices may include, but are not limited to, cursor control devices (e.g., mouse, touchpad, or touchscreen), keyboards, displays (e.g., cathode ray tube displays, liquid crystal displays, or light-emitting diode displays), video input devices (e.g., cameras, or input interfaces coupled to video archives), etc.
[0085] Furthermore, the video codec (for example, the codec that executes processes 200A, 200B, 300A, or 300B) can be implemented as any combination of any software or hardware modules in the device 400. For example, some or all stages of processes 200A, 200B, 300A, or 300B can be implemented as one or more software modules of the device 400, such as program instructions that can be loaded into memory 404. Alternatively, some or all stages of processes 200A, 200B, 300A, or 300B can be implemented as one or more hardware modules of the device 400, such as dedicated data processing circuits (for example, FPGA, ASIC, NPU, etc.).
[0086] This disclosure provides a method for initializing the probabilities of a context model used in CABAC. Figure 5 is a schematic diagram showing a context-based adaptive binary arithmetic coding (CABAC) engine 500 according to some embodiments of this disclosure. Consistent with the disclosed embodiments, the CABAC engine 500 can be used by an encoder to perform binary coding, for example, the CABAC engine 500 can be used in the binary coding stage 226 in Figure 2A or Figure 2B.
[0087] Referring to Figure 5, the CABAC engine 500 includes three basic stages: binaridization 502, context modeling (CM) 504, and binary arithmetic coding (BAE) 506.
[0088] Binary 502 is a data preprocessing procedure. In Binary 502, non-binary syntax elements are coded into strings of binary symbols called "binaries (bins)". Each binary symbol is simply called a binary (bin). Input syntax elements can be binarid using various methods, including table mapping, unari coding, truncated unari coding, fixed-length coding, and unari exponential Golomb-th k (UEG-k) coding.
[0089] For example, UEG-k can be used to binarisonize motion vector differences as follows. In this example, assume that the value of the motion vector component mvd is given. For the prefix portion of the UEG-k binary string, TU binarison with a cutoff value of S=9 is used. If mvd is equal to zero, the binary string contains only the prefix codeword "0". If the condition |mvd|≧9 is met, for a value of |mvd|-9, the suffix is constructed as an EG3 codeword, using the sign bit "1" to sign mvd if it is negative, and the sign bit "0" otherwise. For mvd values of 0<|mvd|<9, the suffix contains only the sign bit. Note that the components of the motion vector difference represent the prediction error with 1 / 4 sample accuracy, and the prefix portion corresponds to the maximum error component of ±2 samples. If the Exp-Golomb parameter k=3 is selected, the suffix codewords are given such that the geometric increase in the prediction error in units of two samples is captured by a linear increase in the length of the corresponding suffix codeword.
[0090] The binarization of the absolute value of the conversion coefficient level (abs_level) in UEG-k is determined by the cutoff value S=14 in the TU prefix portion and the order k=0 in the EGk suffix portion. The binarization and subsequent coding process are applied to the syntax element coeff_abs_value_minus1=abs_level-1, because zero-value conversion coefficient levels are encoded using a significance map. The structure of the binary string for a given value of coeff_abs_value_minus1 is similar to the structure of the UEG-k binary string for the motion vector difference component, except that the suffix does not have a sign bit. The table in Figure 6 shows the binary strings corresponding to abs_level values from 1 to 20, with the prefix portion highlighted by a gray shaded column.
[0091] Referring again to Figure 5, the context modeling 504 calculates a context index (ctxIdx) for each binary number. This context index is used to retrieve the context model, which is stored as a stochastic state table. These tables are updated for each binary number and reinitialized at the beginning of each slice.
[0092] For example, many (e.g., 399) context models can be stored in the encoder. The context index (ctxIdx) is used to track the sum of the context index offset (ctxIdxOffset) and the context index increment (ctxIdxInc). An exception to the above embodiment is the calculation of the context index of a residual syntax element, in which case it is the sum of ctxIdxOffset, ctxIdxInc, and the context block category offset (ctxBlockCatOffset). ctxBlockCatOffset depends on the context block category of the macroblock currently being encoded. ctxIdxOffset is determined by the type of syntax element and the type of slice. ctxIdxInc is different for each binary of the coded syntax element, so it depends on the binary index or binary index (binIdx). In some cases, the calculation of ctxIdxInc depends on neighbor information. For residual syntax elements, the ctxIdxInc calculation also depends on the scan position of the currently coded element and the number of previously coded coefficients.
[0093] Each binary value and its corresponding ctxIdx are sent to the binary arithmetic coding module 506. The binary arithmetic coding module 506 stores information such as the most likely symbol (MPS) and the probability of its state. This consists of context information accessible by ctxIdx. The binary arithmetic coding module 506 has two possible symbols, namely 0 and 1. If one symbol is the most likely symbol, the other symbol becomes the least likely symbol (LPS). Context information consisting of a probability state index and the value of the most likely state (e.g., in the range of 0 to 399) can be stored using context memory or a context table.
[0094] The standard coding engine 508 performs arithmetic coding, where coding intervals are set and updated based on the probabilities of MPS and LPS. Codewords for arithmetic coding are generated by recursively dividing the interval. Two variables, a Low variable and a Range variable called codILow and codIRange, are used to track the interval. Figure 7 shows the values of Range and Low, and when they are updated.
[0095] The initial value of Range is 510, which is a 9-bit register. The initial value of Low is 0, which is a 10-bit register. rMPS and rLPS represent two corresponding sub-intervals, MPS and LPS, respectively. If the input binary is equal to MPS, rMPS is selected as the new interval; otherwise, rLPS is selected. If it is found that the updated Range is outside the intervals 256 and 511 (which include 256 and 511), a renormalization procedure is employed. The renormalization procedure is where most of the codeword is constructed.
[0096] To calculate the rLPS value, the stochastic state P LPSThis is necessary. This value is in the range of 0 to 0.5 and is quantized into 64 discrete stochastic states. These states are indexed by the variable pStateIdx, which is in the range of 0 to 63. The transition to the next state based on the current binary number is shown in Figure 8. A lookup table is used to update the stochastic states.
[0097] As another multiplication operation in this stage, rLPS = Range * R LPS The calculation is also converted to a lookup table. Range is four R Q The value is quantized. The product rLPS is also R Q And it is quantized to 256 values based on pStateIdx. As a result, the calculation of rLPS is R Q This can be easily done by searching a two-dimensional table where pStateIdx is the two indexes.
[0098] Binary arithmetic coding also uses two other coding methods. In the bypass coding engine 510, the context modeler stage is bypassed. This means that the previous values of Range and Low are used and the renormalization of the bypass method is invoked. In the bypass coding engine, the probability of two symbols is considered to be equal to 0.5.
[0099] Furthermore, when the "end of slice" syntax element is reached, or when the mb type is of type IPCM, the terminal coding engine (not shown in Figure 5) can be invoked. Again, no context model is selected. LPS is fixed to 1, and rLPS is fixed to 2. Otherwise, the renormalization is the same. There is also a flush algorithm that is invoked when the end of the slice is reached.
[0100] Each time a new slice is started, an initialization algorithm is called to reset all 399 context models based on the slice type and the cabac init idc value. The slice QP variable is used to calculate the exact context.
[0101] The CABAC engine 500 (Figure 5) described above is explained in relation to the encoder. The inverse operation of the CABAC engine 500 may be performed in the decoder's binary decoding stage 302 (Figure 3A or Figure 3B).
[0102] In VVC, the probabilities of each context model in the context-based adaptive binary arithmetic coding (CABAC) engine are initialized based on the slice type (i.e., I-slice, P-slice, B-slice). The probabilities of the context models are updated when encoding or decoding binary numbers. To improve the accuracy of probability estimation, a multiple-hypothesis probability update model is supported. Two probabilities are associated with each context model and are updated independently at different adaptive rates. The adaptive rate of the probability for each context model is pre-trained based on the associated binary statistical data. The probability used to code the binary number is the average of the estimates from the two hypotheses.
[0103] The Extended Compression Model (ECM), used to further improve the coding performance of the VVC standard, further refines the multiple hypothesis probability update model. The adaptation rate of the two probabilities associated with each context model differs for each slice type and is initialized according to the slice type. Furthermore, to improve the accuracy of probability estimation for various statistical data in different domains, the adaptation rate is adjusted by two delta parameters in the lookup table for each context, retrieved by a previously coded binary number used as an index. The previously coded binary number is used as an index to retrieve the adjustment parameters from the lookup table. Also, the simple mean of the two probabilities is extended to a weighted mean. For each context model of I-slice, B-slice, and P-slice types, three different weight sets are predetermined.
[0104] In ECM, for inter-slice (i.e., B-slice and P-slice), the probability of the context model can be predicted from a previously coded picture having the same slice type, QP, and time ID.
[0105] In current ECM designs, the probability of each context model is initialized according to the slice type or predicted from a previously coded picture. However, this may not be optimal because each picture may have different content. This disclosure provides a method to solve this problem.
[0106] In some embodiments, it is proposed to independently select a context model probability set from multiple context model probability sets for each slice. The multiple context model probability sets comprise N sets, where N is a positive integer. The context model probability sets may include any subset of {initial probabilities, adaptive rates, weights applied to the two probabilities, and tuning parameters}. For each slice, the selection of one context model probability set from the multiple context model probability sets can be based on slice type, QP, time ID, low latency condition, rate cost, etc. The selection may be derived in both the encoder and decoder without signaling. The selection may be signaled to the bitstream.
[0107] As an example, Figure 9 shows a context model probability parameter set 902, which includes the initial probability, the adaptation rate, and the weights applied to the two probabilities associated with the two context models.
[0108] As another example, Figure 10 shows a context model probability parameter set 1002, which includes initial probabilities, adaptation rates, weights applied to the two probabilities, and adjustment parameters associated with each context model. Note that adjustment parameters can be shared across multiple context model probability sets.
[0109] As another example, Figure 11 shows four predefined context model stochastic parameter sets, including a first context model stochastic parameter set 1102 for B slices and non-low-latency conditions, a second context model stochastic parameter set 1104 for P slices, a third context model stochastic parameter set 1106 for I slices, and a fourth context model stochastic parameter set 1108 for B slices and low-latency conditions. For each slice, the context model probability is selected based on its slice type and low-latency condition. This selection is derived in both the encoder and decoder without signaling in the bitstream.
[0110] As another example, a context model probability parameter set of X+1 is predefined, where X is determined based on the number of time IDs. A first context model probability parameter set for non-inter-slices. A second context model probability parameter set for inter-slices with time IDs equal to 0. A third context model probability parameter set for inter-slices with time IDs equal to 1. Similarly, a kth context model probability parameter set for inter-slices with time IDs equal to k-2. For each inter-slice, the context model probability parameter set is selected based on the time ID, except for non-inter-slices, where the first context model probability parameter set is always used.
[0111] As another example, there are five context model stochastic parameter sets, including a first, second, third, and fourth set, predefined for I-slices, P-slices, B-slices under non-low-latency conditions, and B-slices under low-latency conditions, respectively. The fifth set is predicted from a previously coded picture. For I-slices, the first context model stochastic parameter set is used, but for non-I-slices (i.e., inter-slices), one of the second, third, fourth, and fifth sets is selected. The selection is based on rate cost. The rate cost is calculated using each context model stochastic parameter set, and the set with the minimum rate cost is used. To indicate which context model stochastic parameter set is selected, the parameters for inter-slices are signaled to the bitstream.
[0112] Similarly, in another example, there are five context model probability parameter sets, including a first, second, third, and fourth set of predefined sets for I-slices, P-slices, B-slices under non-low-latency conditions, and B-slices under low-latency conditions, respectively. The fifth set is predicted from previously coded pictures. For inter-slices, it is possible to choose whether or not to use the fifth set predicted from previously coded pictures. If it is decided not to use the fifth set, the set is selected by the slice type and low-latency condition.
[0113] In some embodiments, it is proposed to select a context model probabilistic parameter set from multiple context model probabilistic parameter sets associated with each video sequence. The multiple context model probabilistic parameter sets comprise N sets, where N is a positive integer. The context model probabilistic parameter sets may include any subset of {initial probabilities, adaptation rates, weights applied to the two probabilities, and adjustment parameters}. The selection may be signaled to the bitstream associated with the video sequence.
[0114] As an example, four context model stochastic parameter sets are predefined. These four sets include a first context model stochastic parameter set for B slices, a second context model stochastic parameter set for P slices, a third context model stochastic parameter set for I slices, and a fourth context model stochastic parameter set for B slices. Each context model stochastic parameter set includes initial probabilities, an adaptation rate, and weights applied to two probabilities. Tuning parameters are shared among the four context model stochastic parameter sets (i.e., they are the same for all four). For each slice, the corresponding context model is determined by its slice type. A flag may be signaled to the bitstream at the SPS level to indicate whether the first or fourth context model stochastic parameter set is used for B slices.
[0115] Similarly, in another example, each context model probability parameter set includes an initial probability, an adaptation rate, weights applied to the two probabilities, and adjustment parameters.
[0116] Similarly, in another example, an SPS-level flag signaled to the bitstream to indicate whether a first or fourth context model stochastic parameter set is used for a B-slice is determined by a low-latency condition in the encoder. If the low-latency condition is determined, the fourth context model stochastic parameter set is used to initialize the context model for the B-slice, and the SPS-level flag is encoded to indicate that the fourth context model stochastic parameter set was selected. Alternatively, if the non-low-latency condition is determined, the first context model stochastic parameter set is used to initialize the context model for the B-slice, and the SPS-level flag is encoded to indicate that the first context model stochastic parameter set was selected.
[0117] As another example, four context model stochastic parameter sets are predefined. These four context model stochastic parameter sets include a first context model stochastic parameter set for B slices, a second context model stochastic parameter set for P slices, a third context model stochastic parameter set for I slices, and a fourth context model stochastic parameter set for B slices. Each context model stochastic parameter set includes initial probabilities, an adaptation rate, and weights applied to the two probabilities. Adjustment parameters are shared among the four context model stochastic parameter sets (i.e., they are the same for all four context model stochastic parameter sets). Furthermore, for B slices and P slices, it is decided whether or not to predict their context models from the previously coded pictures. If the context models for B slices and P slices are not predicted from the previously coded pictures, the second context model stochastic parameter set is used for P slices, and an SPS-level flag is signaled to indicate whether the first or fourth context model stochastic parameter set is used for B slices. Alternatively, if the context models for B-slice and P-slice are predicted from a previously coded picture, a predefined set of context model probability parameters is not used. Unlike B-slice and P-slice, I-slice does not depend on a previously coded picture and always uses a third set of context model probability parameters.
[0118] In some embodiments, in addition to an SPS-level flag indicating whether a first context model stochastic parameter set or a fourth context model stochastic parameter set is used for a B slice, or instead, a flag may be signaled in the bitstream's PPS level, picture header, or slice header to indicate whether a first context model stochastic parameter set or a fourth context model stochastic parameter set is used for a B slice.
[0119] In some embodiments, there are N predefined sets of context model probability parameters for I-slice type, P-slice type, and B-slice type. Three parameters are signaled to indicate which of the N context model probability parameter sets are used for I-slice type, P-slice type, and B-slice type, respectively.
[0120] In some embodiments, there are two predefined sets of context model probability parameters for each of the I-slice type, P-slice type, and B-slice type. A flag is signaled at the SPS level for each slice type to indicate which of the two sets associated with that slice type is used.
[0121] The above-described method for initializing the probabilistic parameters of the context model can be performed in the encoder and decoder. Figure 12 shows a flowchart of an exemplary method 1200 for decoding a bitstream associated with video, according to some embodiments of the present disclosure. During CABAC execution, method 1200 may be performed by the decoder (e.g., by process 300A in Figure 3A or process 300B in Figure 3B) or by one or more software or hardware components of the apparatus (e.g., apparatus 400 in Figure 4). For example, one or more processors (e.g., processor 402 in Figure 4) can perform method 1200. In some embodiments, method 1200 can be implemented by a computer program product embodied on a computer-readable medium, which includes computer-executable instructions such as program code executed by a computer (e.g., apparatus 400 in Figure 4). As shown in Figure 12, method 1200 includes the following steps 1210-1220.
[0122] In step 1210, the processor (for example, processor 402 in Figure 4) selects a first set of probabilistic parameters for initializing one or more context models used in CABAC for the first slice. The first slice may be the current slice that the processor is decoding. The first set of probabilistic parameters may be selected from a plurality of predefined sets of probabilistic parameters.
[0123] The selected first set of probabilistic parameters can be used to initialize the context models. For example, two context models may be used to perform CABAC. Before performing CABAC on a new slice, the processor needs to initialize the probabilistic parameters used by those two context models.
[0124] Figure 9 provides an example of a stochastic parameter set. As shown in Figure 9, the stochastic parameter set 902 can include one or more of the following: initial probabilities for use with one or more context models, adaptation rates for one or more context models, multiple weights associated with each of the probabilities, or adjusted probabilities for use with one or more context models. Initial probabilities are the initial probability values used for the context models in the CABAC of the slice. If two context models are used in the CABAC, the stochastic parameter set 902 may include two initial probabilities (one for each context model). Adaptation rates are the rates at which the stochastic states of the context models are adapted. Similarly, if two context models are used in the CABAC, the stochastic parameter set 902 may include two adaptation rates (one for each context model). Weights are used to weight the context models. If two context models are used in the CABAC, the stochastic parameter set 902 may include two weights (one for each context model). The above explanation assumes that two context models are used in the CABAC of the slice, but it is possible to use more context models (e.g., three or four). In those cases, the probability parameter set 902 may include more initial probabilities, adaptation rates, and / or weights.
[0125] Furthermore, after the CABAC of the slice has started, the initial probabilities of the context model can be adjusted. The probability parameter set 1002 in Figure 10 further includes one or more adjustment parameters. The adjustment parameters are the adjusted probabilities available to the context model after model initialization.
[0126] In some embodiments, the processor can select a first set of probabilistic parameters from a predefined set of probabilistic parameters without requiring explicit signaling. For example, the selection may be based on one or more of the following: the slice type associated with the first slice, the quantization parameter (QP), the time identifier, the low-latency condition, or the rate cost. Examples of selection based on these parameters are shown above and will not be repeated here.
[0127] Referring again to Figure 12, in step 1220, the processor performs entropy decoding of the first slice based on one or more context models and a first set of probabilistic parameters.
[0128] In some embodiments, the selection of an initial probability parameter set may be signaled to the bitstream. Figure 13 shows a flowchart of an exemplary method 1300 for decoding a bitstream associated with video, according to some embodiments of the present disclosure. During CABAC execution, method 1300 may be executed by a decoder (e.g., by process 300A in Figure 3A or process 300B in Figure 3B) or by one or more software or hardware components of an apparatus (e.g., apparatus 400 in Figure 4). For example, one or more processors (e.g., processor 402 in Figure 4) can execute method 1300. In some embodiments, method 1300 may be implemented by a computer program product embodied on a computer-readable medium, which includes computer-executable instructions such as program code executed by a computer (e.g., apparatus 400 in Figure 4). As shown in Figure 13, method 1300 includes the following steps 1310-1320.
[0129] In step 1310, the processor (for example, processor 402 in Figure 4) selects a first set of probabilistic parameters from a predefined set of probabilistic parameters based on flags or parameters signaled to the bitstream. The first set of probabilistic parameters is used to initialize the context model probabilities used in the CABAC of the first slice.
[0130] For example, a flag may signal an SPS, PPS, picture header, or slice header. For instance, if a flag signals an SPS, the set of probability parameters referenced by the flag may be used for all slices associated with the SPS.
[0131] As another example, a flag or parameter may have a value depending on whether a low-latency condition or a non-low-latency condition is used to encode the first slice. For example, when a low-latency condition is enabled, the value of the flag or parameter (e.g., "2") may refer to the first of several predefined sets of probabilistic parameters. When a non-low-latency condition is enabled, the same value of the flag or parameter (i.e., "2") may refer to the second of several predefined sets of probabilistic parameters.
[0132] In step 1320, the processor performs entropy decoding of the first slice based on one or more context models and a first set of probabilistic parameters.
[0133] The encoding method corresponding to the decoding method described above may be performed by an encoder. For example, Figure 14 shows a flowchart of an exemplary method 1400 for encoding a bitstream associated with video, according to some embodiments of the present disclosure. During CABAC execution, method 1400 may be performed by an encoder (e.g., by process 200A in Figure 2A or process 200B in Figure 2B) or by one or more software or hardware components of a device (e.g., device 400 in Figure 4). For example, one or more processors (e.g., processor 402 in Figure 4) can perform method 1400. In some embodiments, method 1400 may be implemented by a computer program product embodied on a computer-readable medium, which includes computer-executable instructions such as program code executed by a computer (e.g., device 400 in Figure 4). As shown in Figure 14, method 1400 includes the following steps 1410-1420.
[0134] In step 1410, the processor (for example, processor 402 in Figure 4) selects a first set of probabilistic parameters for the first slice to initialize one or more context models used in CABAC. In some embodiments, the selection can be based on at least one of the slice type, quantization parameters (QP), time identifier, low-latency condition, or rate cost associated with the first slice.
[0135] In step 1420, the processor performs entropy coding of the first slice based on one or more context models and a first set of probabilistic parameters.
[0136] Figure 15 shows a flowchart of an exemplary method 1500 for encoding a bitstream associated with video, according to some embodiments of the present disclosure. Similar to method 1400, method 1500 may also be performed by an encoder processor. As shown in Figure 15, method 1500 includes the following steps 1510-1520.
[0137] In step 1510, the processor encodes a flag or parameter indicating a first set of probabilistic parameters in the bitstream. The flag or parameter is associated with a slice and signals that the first set of probabilistic parameters is selected to perform CABAC on the first slice. The selection can be based on at least one of the slice type, quantization parameters (QP), time identifier, low-latency condition, or rate cost associated with the first slice (step 1410 in Figure 14).
[0138] Referring again to Figure 15, in step 1520, the processor performs entropy coding of the first slice based on one or more context models and a first set of probabilistic parameters.
[0139] Furthermore, the disclosed methods can be freely combined.
[0140] In some embodiments, a non-temporary, computer-readable storage medium for storing the bitstream is further provided. A selected context model probability set for a slice can be signaled to the bitstream.
[0141] In some embodiments, a non-temporary computer-readable storage medium containing instructions is also provided, which may be executed by a device (e.g., an encoder and decoder disclosed) to perform the above method. Common forms of non-temporary media include, for example, floppy disks, flexible disks, hard disks, solid-state drives, magnetic tapes, or any other magnetic data storage media, CD-ROMs, any other optical data storage media, physical media having a pattern of holes, RAM, PROMs, and EPROMs, flash EPROMs or any other flash memory, NVRAMs, caches, registers, any other memory chips or cartridges, and networked versions thereof. The device may include one or more processors (CPUs), input / output interfaces, network interfaces, and / or memory.
[0142] Embodiments can be further described using the following clauses. 1. A method for decoding a bitstream associated with a video sequence, The steps include selecting a first set of probabilistic parameters from a predefined set of probabilistic parameters to initialize one or more context models of a B-slice, The step of performing entropy decoding of the B slice based on one or more context models and the first set of probability parameters, The step of selection is based on coding conditions for the B slice or signals in the bitstream. 2. The first set of probability parameters is: Initial probabilities for use with one or more context models, The adaptation rate of the one or more context models, Multiple weights associated with multiple probabilities, or The method according to Clause 1, comprising at least one of the adjusted probabilities for use with the one or more context models. 3. The method according to Clause 1, wherein the coding conditions include at least one of a quantization parameter (QP), a time identifier, a low-latency condition, a non-low-latency condition, or a rate cost associated with the first slice. 4. The method according to Clause 1, wherein the signal includes flags or parameters in the bitstream. 5. The selection step is based on a flag signaled in a sequence parameter set (SPS), picture parameter set (PPS), picture header, or slice header, as described in Clause 4. 6. The method according to Clause 4, wherein the flag or parameter has a value depending on whether a low-latency condition or a non-low-latency condition is used to encode the B slice. 7. The method according to Clause 1, wherein the predefined set of probability parameters includes two predefined sets of probability parameters for the B slice. 8. A method for encoding a bitstream associated with a video sequence, The steps include selecting a first set of probabilistic parameters from a predefined set of probabilistic parameters to initialize one or more context models of a B-slice, The step of performing entropy coding of the B slice based on one or more context models and the first set of probability parameters, The step of selection is based on coding conditions for the B slice or signals in the bitstream. 9. The first set of probability parameters is: Initial probabilities for use with one or more context models, The adaptation rate of the one or more context models, Multiple weights associated with multiple probabilities, or The method according to Clause 8, comprising at least one of the adjusted probabilities for use with the one or more context models. 10. The method according to Clause 8, wherein the coding conditions include at least one of a quantization parameter (QP), a time identifier, a low-latency condition, a non-low-latency condition, or a rate cost associated with the first slice. 11. Further comprising the step of encoding a flag or parameter associated with the B slice in the bitstream, wherein the flag or parameter indicates that the first set of probabilistic parameters is selected. The method described in Article 8. 12. The method according to clause 11, wherein the flag is signaled in a sequence parameter set (SPS), picture parameter set (PPS), picture header, or slice header. 13. The method according to clause 11, wherein the flag or parameter has a value depending on whether a low-latency condition or a non-low-latency condition is used to encode the first slice. 14. The method according to Clause 8, wherein the predefined set of probability parameters includes two predefined sets of probability parameters for the B slice. 15. A non-temporary computer-readable storage medium for storing a video bitstream, wherein the bitstream is: The steps include selecting a first set of probabilistic parameters from a predefined set of probabilistic parameters to initialize one or more contextual models of a B-slice, and Used to perform a process by performing entropy coding or decoding of the B slice based on one or more context models and the first set of probability parameters, The step of selecting a non-temporary, computer-readable storage medium based on the coding conditions of the B slice or the signals in the bitstream. 16. The first set of probability parameters is: Initial probabilities for use with one or more context models, The adaptation rate of the one or more context models, Multiple weights associated with multiple probabilities, or A non-temporary computer-readable storage medium as described in Clause 15, comprising at least one of the adjusted probabilities for use with one or more context models. 17. A non-temporary computer-readable storage medium as described in Clause 15, wherein the coding conditions include at least one of the quantization parameter (QP), time identifier, low-latency condition, non-low-latency condition, or rate cost associated with the first slice. 18. The signal is a non-temporary computer-readable storage medium as described in Clause 15, including flags or parameters in the bitstream. 19. The selection step is based on a non-temporary computer-readable storage medium as described in Clause 18, on a sequence parameter set (SPS), picture parameter set (PPS), picture header, or slice header, on a flag signaled to it. 20. A non-temporary computer-readable storage medium as described in Clause 18, wherein the flag or parameter has a value depending on whether a low-latency condition or a non-low-latency condition is used to encode the B slice. 21. The predefined set of probabilistic parameters includes two predefined sets of probabilistic parameters for the B slice, in a non-temporary computer-readable storage medium as described in Clause 15.
[0143] In this specification, relational terms such as "first" and "second" are used solely to distinguish one entity or action from another, and do not imply or require any actual relationship or order between these entities or actions. Furthermore, "comprising," "having," "containing," "containing," and other similar forms are intended to be synonymous and unrestricted in that the items or groups of items following any of these words do not mean that the items or groups of items following any of these words constitute a complete list of those items or that the list is limited to only the items or groups of items listed.
[0144] As used herein, unless otherwise specified, the term “or” includes all possible combinations, except where impossible. For example, if it is stated that a database may contain A or B, then unless otherwise specified or impossible, the database may contain A or B, or A and B. As a second example, if it is stated that a database may contain A, B, or C, then unless otherwise specified or impossible, the database may contain A or B or C, or A and B, or A and C, or B and C, or A and B and C.
[0145] It is understood that the above embodiments can be implemented by hardware, software (program code), or a combination of hardware and software. When implemented by software, it may be stored on the computer-readable medium. When executed by a processor, the software can perform the methods disclosed. The computing units and other functional units described in this disclosure can be implemented by hardware, software, or a combination of hardware and software. It is also understood by those skilled in the art that several of the above modules / units may be combined into a single module / unit, and each of the above modules / units may be further divided into several submodules / subunits.
[0146] In the above specification, embodiments have been described with reference to numerous specific details that differ depending on the embodiment. Adaptations and modifications can be made to the above embodiments. Given the detailed description and practice of the invention disclosed herein, other embodiments will be obvious to those skilled in the art. This specification and the examples are illustrative, and the true scope and spirit of the invention are intended to be shown by the following claims. Furthermore, the order of steps shown in the drawings is for illustrative purposes only and is not intended to limit the order of steps to a particular set of steps. Thus, those skilled in the art will understand that these steps can be performed in different orders when carrying out the same method.
[0147] Exemplary embodiments are disclosed in the drawings and specification. However, many variations and modifications are possible in these embodiments. Therefore, although specific terms are used, they are used only in a general and descriptive sense and not for the purpose of limitation.
Claims
1. A method for decoding a bitstream associated with a video sequence, The steps include selecting a first set of probabilistic parameters from a predefined set of probabilistic parameters to initialize one or more context models of a B-slice, The step of performing entropy decoding of the B slice based on one or more context models and the first set of probability parameters, The step of selection is based on the coding conditions of the B slice or the signals in the bitstream.
2. The first set of probability parameters is: Initial probabilities for use with one or more context models, The adaptation rate of one or more of the aforementioned context models, Multiple weights associated with multiple probabilities, or The method according to claim 1, comprising at least one of the adjusted probabilities for use with the one or more context models.
3. The method according to claim 1, wherein the coding condition includes at least one of a quantization parameter (QP) associated with a first slice, a time identifier, a low-latency condition, a non-low-latency condition, or a rate cost.
4. The method according to claim 1, wherein the signal includes a flag or parameter in the bitstream.
5. The method according to claim 4, wherein the selection step is based on a flag signaled to a sequence parameter set (SPS), picture parameter set (PPS), picture header, or slice header.
6. The method according to claim 4, wherein the flag or parameter has a value depending on whether a low-latency condition or a non-low-latency condition is used to encode the B slice.
7. The method according to claim 1, wherein the predefined set of probability parameters includes two predefined sets of probability parameters for the B slice.
8. A method for encoding a bitstream associated with a video sequence, The steps include selecting a first set of probabilistic parameters from a predefined set of probabilistic parameters to initialize one or more context models of a B-slice, The step of performing entropy coding of the B slice based on one or more context models and the first set of probability parameters, The step of selection is based on the coding conditions of the B slice or the signals in the bitstream.
9. The first set of probability parameters is: Initial probabilities for use with one or more context models, The adaptation rate of one or more of the aforementioned context models, Multiple weights associated with multiple probabilities, or The method according to claim 8, comprising at least one of the adjusted probabilities for use with the one or more context models.
10. The method according to claim 8, wherein the coding condition includes at least one of a quantization parameter (QP) associated with the first slice, a time identifier, a low-latency condition, a non-low-latency condition, or a rate cost.
11. The step further includes encoding a flag or parameter associated with the B slice in the bitstream, wherein the flag or parameter indicates that the first set of probabilistic parameters is selected. The method according to claim 8.
12. The method according to claim 11, wherein the flag is signaled to a sequence parameter set (SPS), a picture parameter set (PPS), a picture header, or a slice header.
13. The method according to claim 11, wherein the flag or parameter has a value depending on whether a low-latency condition or a non-low-latency condition is used to encode the first slice.
14. The method according to claim 8, wherein the predefined set of probability parameters includes two predefined sets of probability parameters for the B slice.
15. A non-temporary, computer-readable storage medium for storing a video bitstream, wherein the bitstream is The steps include selecting a first set of probabilistic parameters from a predefined set of probabilistic parameters to initialize one or more context models of a B-slice, and Used to perform a process by performing entropy coding or decoding of the B slice based on one or more context models and the first set of probability parameters, The step of selecting a non-temporary, computer-readable storage medium based on the coding conditions of the B slice or the signals in the bitstream.
16. The first set of probability parameters is: Initial probabilities for use with one or more context models, The adaptation rate of one or more of the aforementioned context models, Multiple weights associated with multiple probabilities, or A non-temporary computer-readable storage medium according to claim 15, comprising at least one of adjusted probabilities for use with one or more context models.
17. The non-temporary computer-readable storage medium according to claim 15, wherein the coding conditions include at least one of a quantization parameter (QP) associated with a first slice, a time identifier, a low-latency condition, a non-low-latency condition, or a rate cost.
18. The non-temporary computer-readable storage medium according to claim 15, wherein the signal includes flags or parameters in the bitstream.
19. The non-temporary computer-readable storage medium according to claim 18, wherein the selection step is based on a flag signaled to a sequence parameter set (SPS), a picture parameter set (PPS), a picture header, or a slice header.
20. The non-temporary computer-readable storage medium according to claim 15, wherein the flag or parameter has a value depending on whether a low-latency condition or a non-low-latency condition is used to encode the B slice.