Probability estimation for entropy coding

By combining a multimodal linear update model and a CABAC model, the problem of low efficiency in time-varying probability estimation in video compression is solved, achieving more efficient video data compression, which is suitable for real-time video encoding and decoding applications.

CN114556790BActive Publication Date: 2026-06-30GOOGLE LLC

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
GOOGLE LLC
Filing Date
2020-11-09
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing video compression technologies are inefficient in handling time-varying probability estimation, leading to a waste of video data processing and storage resources, especially in real-time or latency-sensitive applications where effective lossless compression is difficult to achieve.

Method used

A multimodal linear update model is used to estimate the probability of symbol sequences. Combined with the context-adaptive binary arithmetic coding (CABAC) model, online probability estimation is performed through a single system, which improves the accuracy and efficiency of the probability model.

Benefits of technology

It improves the compression efficiency of video data and reduces the number of bits required to represent video data, making it suitable for real-time or latency-sensitive video encoding and decoding applications.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN114556790B_ABST
    Figure CN114556790B_ABST
Patent Text Reader

Abstract

The method describes entropy coding of a symbol sequence. A first probabilistic model is selected for entropy coding. At least one symbol of the sequence is entropy-coded using a probability determined by using the first probabilistic model. Subsequent symbols are entropy-coded based on the probabilities of the first probabilistic model, updated with an estimate from a second probabilistic model. The combination can be a fixed or adaptive combination.
Need to check novelty before this filing date? Find Prior Art

Description

[0001] Cross-references to related applications

[0002] This application claims priority to U.S. Provisional Patent Application No. 62 / 932,508, filed November 8, 2019, the entire contents of which are incorporated herein by reference. Background Technology

[0003] Digital video streams can be represented using a series of frames or still images. Digital video can be used in a variety of applications, including video conferencing, high-definition video entertainment, video advertising, or sharing user-generated videos. Digital video streams can contain large amounts of data and consume significant computing or communication resources to process, transmit, or store the video data. Various methods have been proposed to reduce the amount of data in video streams, including lossy and lossless compression techniques. Summary of the Invention

[0004] Probability estimation is used for entropy coding, particularly for context-based entropy coding in lossless compression. This paper describes a multimodal method that uses multiple linear update models to accurately estimate probabilities.

[0005] The teachings in this paper provide a method for entropy coding of a sequence of symbols. This method may include: determining a first probability model for entropy coding of the sequence, the first probability model being one of several available probability models; entropy coding at least one symbol of the sequence using the probabilities determined by the first probability model; after entropy coding of the corresponding symbol of the sequence, determining a first probability estimate to update the probabilities using the first probability model; for subsequent symbols relative to at least one symbol of the sequence, determining a second probability estimate using a second probability model; and entropy coding the subsequent symbols using the probabilities updated by a combination of the first and second probability estimates. An apparatus for performing this method is also described.

[0006] The aspects of this disclosure are disclosed in the following detailed description of the embodiments, the appended claims, and the accompanying drawings. Attached Figure Description

[0007] The description herein refers to the accompanying drawings, in which the same reference numerals refer to the same parts in the various views.

[0008] Figure 1 This is a schematic diagram of an example video encoding and decoding system.

[0009] Figure 2 This is a block diagram of an example computing device that can implement a sending station or a receiving station.

[0010] Figure 3 This is a schematic diagram of an example of a video stream that is to be encoded and subsequently decoded.

[0011] Figure 4 This is a block diagram of an example encoder.

[0012] Figure 5 This is a block diagram of an example decoder.

[0013] Figure 6 This is a schematic diagram illustrating the quantization transformation coefficients according to an embodiment of the present disclosure.

[0014] Figure 7 This is a schematic diagram of a coefficient token tree, which can be used to encode block entropy into a video bitstream according to embodiments of this disclosure.

[0015] Figure 8 This is a schematic diagram of a tree for binarizing quantization transform coefficients according to an embodiment of the present disclosure.

[0016] Figure 9 This is a flowchart illustrating the method for entropy coding of symbol sequences based on the teachings in this article. Detailed Implementation

[0017] Video compression schemes may include breaking down corresponding images or frames into smaller parts (such as blocks) and generating coded bitstreams using techniques that limit the information included in each block. The coded bitstreams can be decoded to reconstruct or reassemble the source image from the limited information. The information may be limited by lossy encoding, lossless encoding, or some combination of lossy and lossless encoding.

[0018] One type of lossless coding is entropy coding, where entropy is generally considered to be the degree of disorder or randomness in a system. Entropy coding compresses sequences in an information-efficient manner. That is, the lower bound of the length of the compressed sequence is the entropy of the original sequence. Efficient entropy coding algorithms generate codes of length close to the entropy (e.g., in bits) as needed. For a sequence of length N, the entropy associated with a binary codeword can be defined by the following equation (1):

[0019]

[0020] The variable p represents the probability of a single symbol, and the variable p t This represents the probability distribution of codewords at time t based on previously observed symbols. Arithmetic coding can use probabilities to construct codewords.

[0021] However, the coder does not receive the streamed symbol sequence and the probability distribution of the symbols. Instead, probability estimation can be used in video encoding and decoding to implement entropy coding. That is, the probability distribution of the symbols can be estimated. In this case, the code length is close to the following equation (2):

[0022]

[0023] In other words, entropy coding can rely on a probabilistic estimation model (also referred to here as a probabilistic model) that models the distribution of values ​​appearing in the encoded bitstream. This is achieved by using a probabilistic model based on the measurement or estimation of the distribution of values, so that... Approaching p t Entropy coding can reduce the number of bits required to represent input data, bringing it closer to the theoretical minimum (i.e., the lower bound).

[0024] In practice, the actual reduction in the number of bits required to represent video data can be a function of the accuracy of the probabilistic model, the number of bits required to perform coding, and the computational accuracy of the algorithm used to perform coding (e.g., fixed-point). A significant challenge in estimation is that the probability is time-varying, meaning p t It cannot be replaced by a single value p.

[0025] To address the time-varying nature of probability, this paper describes a probability estimation method that combines a probability estimation model (which is a first-order linear system) with another model to form a higher-order linear system. While the teachings in this paper can be used in both one-pass and two-pass coding systems, the probability estimation presented here can be termed online probability estimation because it can be used efficiently in one-pass systems. Available probability estimation models can be context-adaptive binary arithmetic coding (CABAC), AV1 models, counting models, or any other probability estimation model or algorithm.

[0026] Embodiments according to this disclosure can more accurately model the conditional probabilities of streaming symbols, enabling efficient performance of probability estimation for entropy coding, particularly for context-based entropy coding in lossless compression. Probability estimation facilitates efficient compression, reducing the number of bits required to represent video data. Probability estimation can be used for any probability estimation of a symbol sequence, but online probability estimation for such sequences may be particularly effective (e.g., real-time or latency-sensitive applications of video coding).

[0027] Further details regarding the estimation of probabilities used for entropy coding of symbols are described in this paper first with reference to systems that may contain teachings.

[0028] Figure 1 This is a schematic diagram of an example of a video encoding and decoding system 100. For example, the transmitting station 102 may be a computer with internal hardware configuration, such as... Figure 2The computer described herein. However, other implementations of the sending station 102 are possible. For example, the processing of the sending station 102 can be distributed among multiple devices.

[0029] Network 104 can connect sending station 102 and receiving station 106 to encode and decode video streams. Specifically, the video stream can be encoded at sending station 102, and the encoded video stream can be decoded at receiving station 106. For example, network 104 can be the Internet. Network 104 can also be a local area network (LAN), a wide area network (WAN), a virtual private network (VPN), a cellular phone network, or any other means of transmitting video streams from sending station 102 to receiving station 106 in this example.

[0030] In one example, the receiving station 106 may have, such as Figure 2 The computer described herein has its internal hardware configuration. However, other suitable implementations of the receiving station 106 are also possible. For example, the processing of the receiving station 106 can be distributed among multiple devices.

[0031] Other implementations of the video encoding and decoding system 100 are possible. For example, network 104 may be omitted in the implementation. In another implementation, the video stream may be encoded and then stored for later transmission to receiving station 106 or any other device with memory. In one implementation, receiving station 106 receives (e.g., via network 104, computer bus, and / or some communication path) the encoded video stream and stores the video stream for later decoding. In an example implementation, Real-Time Transport Protocol (RTP) is used to transmit the encoded video via network 104. In another implementation, a transport protocol other than RTP may be used (e.g., a video streaming protocol based on Hypertext Transfer Protocol (HTTP)).

[0032] When used in a video conferencing system, for example, sending station 102 and / or receiving station 106 may include the ability to encode and decode video streams as described below. For example, receiving station 106 may be a video conferencing participant who receives an encoded video bitstream from a video conferencing server (e.g., sending station 102) to decode and view their own video bitstream, and further encodes and transmits that video bitstream to the video conferencing server for other participants to decode and view.

[0033] In some embodiments, the video encoding and decoding system 100 can alternatively be used to encode and decode data other than video data. For example, the video encoding and decoding system 100 can be used to process image data. Image data may include data blocks from an image. In this embodiment, the transmitting station 102 can be used to encode the image data, and the receiver 106 can be used to decode the image data. Alternatively, the receiving station 106 can represent a computing device that stores encoded image data for later use, such as after receiving encoded or pre-encoded image data from the transmitting station 102. As a further alternative, the transmitting station 102 can represent a computing device that decodes the image data, such as before transmitting the decoded image data to the receiver 106 for display.

[0034] Figure 2 This is a block diagram illustrating an example of a computing device 200 that can be implemented as a transmitting station or a receiving station. For example, the computing device 200 can be implemented... Figure 1 One or both of the transmitting station 102 and the receiving station 106. The computing device 200 can be in the form of a computing system including multiple computing devices or in the form of a single computing device, such as a mobile phone, tablet computer, laptop computer, notebook computer, desktop computer, etc.

[0035] The processor 202 in computing device 200 can be a conventional central processing unit. Alternatively, processor 202 can be another type of device or multiple devices capable of manipulating or processing information that is present or developed later. For example, although the disclosed implementation can be practiced with a single processor (e.g., processor 202) as shown, advantages in speed and efficiency can be achieved by using more than one processor.

[0036] In one embodiment, the memory 204 in the computing device 200 may be a read-only memory (ROM) device or a random access memory (RAM) device. However, other suitable types of storage devices may be used as memory 204. Memory 204 may include code and data 206 accessed by the processor 202 using bus 212. Memory 204 may further include an operating system 208 and application programs 210, which include at least one program that allows the processor 202 to execute the techniques described herein. For example, application programs 210 may include applications 1 to N, which further include video coded applications that execute the techniques described herein. The computing device 200 may also include a secondary storage device 214, for example, a memory card used with a mobile computing device. Because video communication sessions can contain a large amount of information, it may be stored, in whole or in part, in the secondary storage device 214 and loaded into memory 204 for processing as needed.

[0037] The computing device 200 may also include one or more output devices, such as a display 218. In one example, the display 218 may be a touch-sensitive display that combines a display with a touch-sensitive element operable to sense touch input. The display 218 may be coupled to the processor 202 via a bus 212. In addition to or as an alternative to the display 218, other output devices may be provided that allow the user to program or otherwise use the computing device 200. When the output device is a display or includes a display, the display may be implemented in various ways, including via a liquid crystal display (LCD), a cathode ray tube (CRT) display, or a light-emitting diode (LED) display, such as an organic LED (OLED) display.

[0038] The computing device 200 may also include or communicate with an image sensing device 220 (e.g., a camera) or any other existing or later-developed image sensing device 220 capable of sensing or communicating with images (such as images of a user operating the computing device 200). The image sensing device 220 may be positioned so that it is directed toward a user operating the computing device 200. In this example, the position and optical axis of the image sensing device 220 may be configured such that the field of view includes an area directly adjacent to and visible through the display 218.

[0039] The computing device 200 may also include a sound sensing device 222 (e.g., a microphone) or any other sound sensing device capable of sensing or communicating with the present or future presence of sound in the vicinity of the computing device 200. The sound sensing device 222 may be positioned so as to be directed toward a user operating the computing device 200 and may be configured to receive sound, such as voice or other speech uttered by the user while operating the computing device 200.

[0040] although Figure 2 The processor 202 and memory 204 of computing device 200 are depicted as integrated into a single unit, but other configurations may also be utilized. The operation of processor 202 can be distributed among multiple machines (where a single machine may have one or more processors), which may be directly coupled or span a local area network or other network. Memory 204 can be distributed among multiple machines, such as network-based memory or memory in multiple machines performing the operations of computing device 200. Although depicted herein as a single bus, bus 212 of computing device 200 may consist of multiple buses. Furthermore, secondary storage device 214 may be directly coupled to other components of computing device 200 or accessible via a network, and may include integrated units (such as memory cards) or multiple units (such as multiple memory cards). Therefore, computing device 200 can be implemented in various configurations.

[0041] Figure 3 This is a schematic diagram of an example of a video stream 300 to be encoded and subsequently decoded. The video stream 300 includes a video sequence 302. At the next level, the video sequence 302 includes several adjacent frames 304. Although three frames are depicted as adjacent frames 304, the video sequence 302 can include any number of adjacent frames 304. The adjacent frames 304 can then be further subdivided into individual frames, for example, frames 306. At the next level, frames 306 can be divided into a series of planes or segments 308. For example, segments 308 can be a subset of frames that allow for parallel processing. Segments 308 can also be a subset of frames that can be divided into individual colors. For example, frames 306 of color video data can include a luma plane and two chroma planes. Segments 308 can be sampled at different resolutions.

[0042] Regardless of whether frame 306 is divided into segments 308, frame 306 can be further subdivided into blocks 310, which can contain data corresponding to, for example, 16×16 pixels in frame 306. Block 310 can also be configured to include data from one or more segments 308 of pixel data. Block 310 can also have any other suitable size, such as 4×4 pixels, 8×8 pixels, 16×8 pixels, 8×16 pixels, 16×16 pixels, or larger. Unless otherwise stated, the terms block and macroblock are used interchangeably herein.

[0043] Figure 4 This is a block diagram of an example encoder 400. As described above, encoder 400 can be implemented in transmitting station 102, for example, by providing a computer software program stored in memory (e.g., memory 204). The computer software program may include machine instructions that, when executed by a processor (e.g., processor 202), cause transmitting station 102 to... Figure 4 The video data is encoded in the manner described herein. Encoder 400 can also be implemented as dedicated hardware, for example, included in transmitting station 102. In a particularly preferred embodiment, encoder 400 is a hardware encoder.

[0044] Encoder 400 has the following stages for performing various functions in the forward path (represented by solid connect lines) to produce an encoded or compressed bitstream 420 by using video stream 300 as input: intra / inter-frame prediction stage 402, transform stage 404, quantization stage 406, and entropy coding stage 408. Encoder 400 may also include a reconstruction path (represented by dashed connect lines) for reconstructing frames that encode future blocks. Figure 4 In this encoder 400, the following stages are included to perform various functions in the reconstruction path: dequantization stage 410, inverse transform stage 412, reconstruction stage 414, and loop filtering stage 416. Other structural variations of the encoder 400 can be used to encode the video stream 300.

[0045] When video stream 300 is presented for encoding, corresponding adjacent frames 304, such as frame 306, can be processed in blocks. During the intra / inter-frame prediction phase 402, the corresponding blocks can be encoded using either intra-frame prediction (also known as internal prediction) or inter-frame prediction (also known as intermediate prediction). In either case, prediction blocks can be formed. In the case of intra-frame prediction, prediction blocks can be formed from samples that have been previously encoded and reconstructed in the current frame. In the case of inter-frame prediction, prediction blocks can be formed from samples in one or more previously constructed reference frames.

[0046] Next, in the intra / inter-frame prediction phase 402, the predicted block can be subtracted from the current block to produce a residual block (also called the residual). The transform phase 404 uses a block-based transform to convert the residual into transform coefficients, for example, in the frequency domain. The quantization phase 406 uses a quantizer value or quantization level to convert the transform coefficients into discrete quantum values, which are called quantized transform coefficients. For example, the transform coefficients can be divided by the quantizer value and truncated.

[0047] The quantized transform coefficients are then entropy encoded through entropy coding stage 408. The entropy-encoded coefficients, along with other information for decoding the block (e.g., syntax elements such as indications of the prediction type, transform type, motion vector, quantizer value, etc.), are then output to compressed bitstream 420. Compressed bitstream 420 can be formatted using various techniques, such as variable-length coding (VLC) or arithmetic coding. Compressed bitstream 420 may also be referred to as an encoded video stream or encoded video bitstream, and the terms are used interchangeably herein.

[0048] The reconstruction path (represented by dashed connectors) can be used to ensure the encoder 400 and decoder 500 (see below for details). Figure 5 The described process uses the same reference frame to decode the compressed bitstream 420. The reconstruction path execution and the functions that occur during the decoding process (see below) Figure 5 Similar functionality (as described) includes dequantizing the quantized transform coefficients in the dequantization stage 410 and performing an inverse transform on the dequantized transform coefficients in the inverse transform stage 412 to generate a derivative residual block (also known as the derivative residual). In the reconstruction stage 414, the predicted block predicted in the intra / inter-frame prediction stage 402 can be added to the derivative residual to create a reconstructed block. The loop filtering stage 416 can be applied to the reconstructed block to reduce distortion, such as blocking artifacts.

[0049] Other variations of encoder 400 can be used to encode compressed bitstream 420. In some embodiments, a non-transform-based encoder can directly quantize the residual signal without a transform phase 404 for certain blocks or frames. In some embodiments, the encoder may have a quantization phase 406 and a dequantization phase 410 combined in a common phase.

[0050] Figure 5 This is a block diagram of an example decoder 500. Decoder 500 can be implemented in receiving station 106, for example, by providing a computer software program stored in memory 204. The computer software program may include machine instructions that, when executed by a processor (such as processor 202), cause receiving station 106 to... Figure 5The video data is decoded in the manner described herein. The decoder 500 can also be implemented in hardware, for example, included in the transmitting station 102 or the receiving station 106.

[0051] Similar to the reconstruction path of encoder 400 discussed above, decoder 500, in one example, includes the following stages to perform various functions to produce output video stream 516 from compressed bitstream 420: entropy decoding stage 502, dequantization stage 504, inverse transform stage 506, intra / inter-frame prediction stage 508, reconstruction stage 510, loop filtering stage 512, and deblocking filtering stage 514. Other structural variations of decoder 500 can be used to decode compressed bitstream 420.

[0052] When the compressed bitstream 420 is presented for decoding, the data elements within the compressed bitstream 420 can be decoded in the entropy decoding stage 502 to produce a set of quantized transform coefficients. The dequantization stage 504 dequantizes the quantized transform coefficients (e.g., by multiplying the quantized transform coefficients by the quantizer value), and the inverse transform stage 506 performs an inverse transform on the dequantized transform coefficients to produce a derivative residual, which can be the same as the derivative residual produced in the encoder 400 in the inverse transform stage 412. Using the header information decoded from the compressed bitstream 420, the decoder 500 can use the intra / inter-frame prediction stage 508 to create the same prediction block as the one created in the encoder 400 (e.g., in the intra / inter-frame prediction stage 402).

[0053] In reconstruction stage 510, the predicted block can be added to the derivative residual to produce a reconstructed block. Loop filtering stage 512 can be applied to the reconstructed block to reduce block artifacts. Other filtering can be applied to the reconstructed block. In this example, deblocking filtering stage 514 is applied to the reconstructed block to reduce block distortion, and the output is given as output video stream 516. Output video stream 516 can also be referred to as decoded video stream, and the terms are used interchangeably herein. Other variations of decoder 500 can be used to decode compressed bitstream 420. In some implementations, decoder 500 can produce output video stream 516 without deblocking filtering stage 514.

[0054] As can be seen from the above description of the encoder 400 and decoder, in encoding a video bitstream, bits are typically used for one of two things: content prediction (e.g., inter-frame mode / motion vector coding, intra-frame prediction mode coding, etc.) or residual or coefficient coding (e.g., transform coefficients). The encoder can use techniques to reduce the bits required for coefficient coding. For example, a coefficient token tree (also called a binary token tree) specifies a range of values ​​using forward adaptive probabilities for each branch of the token tree. The token base value is subtracted from the value to be coded to form a residual, and then the block is coded with a fixed probability. Similar schemes with minor variations in backward adaptation are also possible. Adaptive techniques can change the probability model to adapt to varying characteristics of the data as the video stream is encoded. In any case, the decoder is informed of (or able to obtain) the probability model used to encode the entropy-coded video bitstream in order to decode the video bitstream.

[0055] Before updating the probability estimate describing the symbol sequence, from Figure 6 The development of the symbol sequence is described.

[0056] Figure 6 A schematic diagram 600 illustrates quantization transform coefficients according to an embodiment of the present disclosure. Schematic diagram 600 depicts the current block 620, scan order 602, quantization transform block 604, non-zero mapping 606, block end mapping 622, and sign mapping 626. The current block 620 is illustrated as a 4×4 block. However, any block size is possible. For example, the current block can have a size (i.e., dimension) of 4×4, 8×8, 16×16, 32×32, or any other square or rectangular block size. The current block 620 can be a block of the current frame. In another example, the current frame can be divided into segments (such as...). Figure 3 The segments (308), tiles, etc., each contain a set of blocks, where the current block is the block of the partition.

[0057] Quantization transform block 604 can be a block of similar size to the current block 620. Quantization transform block 604 includes non-zero coefficients (e.g., coefficient 608) and zero coefficients (e.g., coefficient 610). As described above, quantization transform block 604 contains the quantization transform coefficients of the residual block corresponding to the current block 620. Also as described above, the quantization transform coefficients are obtained through entropy coding segments (…). Figure 4 The entropy is encoded in the entropy coding stage 408.

[0058] Entropy coding of quantized transform coefficients can involve the selection of a context model (also known as a probabilistic context model, probabilistic model, model, and context) that provides an estimate of the conditional probability for coding the binary symbols of the binarized transform coefficients, as shown below regarding... Figure 7As described, when the quantized transform coefficients are entropy-coded, additional information can be used as context for selecting a contextual model. For example, the magnitude of the previously coded transform coefficients can be used at least in part to determine the probabilistic model.

[0059] To encode a transform block, a video coding system can traverse the transform block in scan order and encode the quantized transform coefficients (e.g., entropy coding) as they are traversed (i.e., accessed). Following a zigzag scan order, such as scan order 602, the top-left corner of the transform block (also called the DC coefficient) is traversed and encoded first, then the next coefficient in the scan order (i.e., the transform coefficient corresponding to the position marked "1") is traversed, and so on. Following a zigzag order (i.e., scan order 602), some quantized transform coefficients above and to the left of the current quantized transform coefficient (e.g., the transform coefficient to be encoded) are traversed first. Other scan orders are possible. By traversing a two-dimensional quantized transform block using a scan order, a one-dimensional structure (e.g., an array) of the quantized transform coefficients can be obtained.

[0060] In some examples, encoding the quantization transform block 604 may include determining a non-zero mapping 606, which indicates which transform coefficients of the quantization transform block 604 are zero and which are non-zero. In the non-zero mapping, non-zero coefficients and zero coefficients may be represented by the values ​​one (1) and zero (0), respectively. For example, the non-zero mapping 606 includes a non-zero 607 at the Cartesian position (0,0) corresponding to coefficient 608 and a zero 608 at the Cartesian position (2,0) corresponding to coefficient 610.

[0061] In some examples, encoding the quantization transform block 604 may include generating and encoding a block end map 622. The block end map indicates whether a non-zero quantization transform coefficient of the quantization transform block 604 is the last non-zero coefficient relative to a given scan order. If a non-zero coefficient is not the last non-zero coefficient in the transform block, it can be represented by a binary bit zero (0) in the block end map. If, on the other hand, a non-zero coefficient is the last non-zero coefficient in the transform block, it can be represented by a binary value one (1) in the block end map. For example, when the quantization transform coefficient corresponding to scan position 11 (i.e., the last non-zero quantization transform coefficient 628) is the last non-zero coefficient of the quantization transform block 604, it is indicated by a block end value 624 of one (1); all other non-zero transform coefficients are indicated by zero.

[0062] In some examples, encoding the quantization transform block 604 may include generating and encoding a symbol map 626. The symbol map 626 indicates which non-zero quantization transform coefficients of the quantization transform block 604 have positive values ​​and which have negative values. Transform coefficients that are zero do not need to be represented in the symbol map. The symbol map 626 exemplifies the symbol map of the quantization transform block 604. In the symbol map, negative quantization transform coefficients can be represented by -1, and positive quantization transform coefficients can be represented by -1 (1).

[0063] Figure 7 This is a schematic diagram of a coefficient token tree 700, which can be used to encode block entropy into a video bitstream according to embodiments of this disclosure. The coefficient token tree 700 is called a binary tree because at each node in the tree, one of two branches must be taken (i.e., traversed). The coefficient token tree 700 includes a root node 701 and nodes 703, which correspond to nodes labeled A and B, respectively.

[0064] As mentioned above Figure 6 As described, when an End-of-Block (EOB) token is detected, the coding of coefficients in the current block can be terminated, and the remaining coefficients in the block can be inferred to be zero. Therefore, in a video coding system, the coding of the EOB position can be a necessary part of the coefficients.

[0065] In some video coding systems, the binary decision to determine whether the current token is equal to the EOP token of the current block is immediately coded after a non-zero coefficient is decoded or at the first scan position (DC). In one example, for a transform block of size M×N, where M represents the number of columns and N represents the number of rows, the maximum number of codings for determining whether the current token is equal to the EOP token is equal to M×N. M and N can take values ​​such as 2, 4, 8, 16, 32, and 64. As described above, the binary decision corresponds to the coding of a “1” bit in the coefficient token tree 700, which moves from the root node 701 to node 703. Here, “coding a bit” can refer to outputting or generating a bit in a codeword representing the transform coefficient being encoded. Similarly, “decoding a bit” can refer to reading (e.g., from the encoded bit stream) the bit of the codeword corresponding to the decoded quantized transform coefficient, so that the bit is on the branch traversed in the coefficient token tree.

[0066] Using coefficient token tree 700, for quantization transform blocks (such as, Figure 6 The quantization coefficients of the quantization transform block 604 (e.g., Figure 6 The coefficients 608 and 610 are used to generate a string of binary numbers.

[0067] In one example, following the prescribed scanning order (e.g., Figure 6 The quantization coefficients in an N×N block (e.g., quantization transform block 604) are organized into a 1D (one-dimensional) array (here, array u). N can be 4, 8, 16, 32, or any other value. The quantization coefficient at the i-th position of the 1D array can be called u[i], where i = 0...N*N-1. The starting position of the last round of zeros in u[i]...u[N*N-1] is denoted as eob. eob can be set to the value N*N if u[N*N-1] is not zero. That is, if the last coefficient of the 1D array u is not zero, then eob can be set to the value N*N. Figure 6 For example, a 1D array u can have entries u[] = [-6, 0, -1, 0, 2, 4, 1, 0, 0, 1, 0, -1, 0, 0, 0, 0]. The value of each u[i]s is a quantization transform coefficient. The quantization transform coefficients of the 1D array u can also be simply referred to here as "coefficients" or "transform coefficients". The coefficient at position i = 0 (i.e., u[0] = -6) corresponds to the DC coefficient. In this example, eob equals 12 because there are no non-zero coefficients after position 12 of the 1D array u where the zero coefficient is located.

[0068] To encode and decode the coefficients u[i]...u[N*N-1], a token t[i] is generated at each position i <= eob for i = 0 to N*N-1. For i < eob, the token t[i] may indicate the size and / or range of the corresponding quantized transform coefficient of u[i]. The token for the quantized transform coefficient of eob may be EOB_TOKEN, which is a token indicating that the 1D array u does not contain non-zero coefficients after the eob position (including the endpoints). That is, t[eob] = EOB_TOKEN indicates the EOB position of the current block. Table I below provides a list of examples of token values, excluding EOB_TOKEN, and their corresponding names according to embodiments of this disclosure.

[0069] Table I

[0070] Token Token Name 0 ZERO_TOKEN 1 ONE-TOKEN 2 TWO_TOKEN 3 THREE_TOKEN 4 FOUR_TOKEN 5 DCT_VAL_CAT1(5,6) 6 DCT_VAL_CAT2(7-10) 7 DCT_VAL_CAT3(11-18) 8 DCT_VAL_CAT4(19-34) 9 DCT_VAL_CAT5(35-66) 10 DCT_VAL_CAT6(67-2048)

[0071] In one example, the quantization coefficient value is taken as a signed 12-bit integer. To represent the quantization coefficient value, the 12-bit signed value can be divided into 11 tokens (tokens 0 to 10 in Table I) plus a block end token (EOB_TOKEN). To generate tokens to represent the quantization coefficient value, the coefficient token tree 700 can be traversed. The encoder can then encode the result of traversing the tree (i.e., the bit string) into a bitstream (such as... Figure 4 The bitstream 420), as about Figure 4 The entropy coding stage 408 is described.

[0072] The coefficient token tree 700 includes tokens EOB_TOKEN (token 702), ZERO_TOKEN (token 704), ONE_TOKEN (token 706), TWO_TOKEN (token 708), THREE_TOKEN (token 710), FOUR_TOKEN (token 712), CAT1 (token 714 as DCT_VAL_CAT1 in Table I), CAT2 (token 716 as DCT_VAL_CAT2 in Table I), CAT3 (token 718 as DCT_VAL_CAT3 in Table I), CAT4 (token 720 as DCT_VAL_CAT4 in Table I), CAT5 (token 722 as DCT_VAL_CAT5 in Table I), and CAT6 (token 724 as DCT_VAL_CAT6 in Table I). As can be seen, the coefficient token tree maps individual quantization coefficient values ​​to individual tokens, such as one of tokens 704, 706, 708, 710, and 712. Other tokens, such as tokens 714, 716, 718, 720, 722, and 724, represent the range of quantization coefficient values. For example, a quantization transform coefficient with a value of 37 can be represented by... Figure 7 The tokens DCT_VAL_CAT5 to 722 are represented in the code.

[0073] The base value of a token is defined as the smallest number within its range. For example, the base value of token 720 is 19. Entropy coding identifies the token for each quantization coefficient, and if the token represents a range, the residual can be formed by subtracting the base value from the quantization coefficient. For example, a quantization transform coefficient with a value of 20 can be represented by including token 720 and the residual value 1 (i.e., 20 minus 19) in the encoded video stream to allow the decoder to reconstruct the original quantization transform coefficient. The end-of-block token (i.e., token 702) indicates that there are no more non-zero quantization coefficients in the transformed block data.

[0074] In order to use binary arithmetic codeization engines (such as via...) Figure 4 The entropy encoding stage 408 encodes or decodes token t[i] using a coefficient token tree 700. Traversal of the coefficient token tree 700 begins at the root node 701 (i.e., the node labeled A). Traversing the coefficient token tree generates bit strings (codewords), which are encoded into a bit stream using, for example, binary arithmetic. The bit string is a representation of the current coefficient (i.e., the encoded quantization transform coefficient).

[0075] If the current coefficient is zero, and the remaining conversion coefficients no longer have non-zero values, then token 702 (i.e., EOB_TOKEN) is added to the bitstream. For example, in Figure 6 The transformation coefficient at scan order position 12 is in this case. On the other hand, if the current coefficient is non-zero, or if there is a non-zero value among any remaining coefficients in the current block, a "1" bit is added to the codeword, and the traversal is passed to node 703 (i.e., the node labeled B). At node B, the current coefficient is tested to see if it is equal to zero. If it is, a left-hand branch is taken to add a token 704 representing the value ZERO_TOKEN and the bit "0" to the codeword. If it is not, a bit "1" is added to the codeword, and the traversal is passed to node C. At node C, the current coefficient is tested to see if it is greater than 1. If the current coefficient is equal to one (1), a left-hand branch is taken, and a token 706 representing the value ONE_TOKEN is added to the bitstream (i.e., a "0" bit is added to the codeword). If the current coefficient is greater than one (1), the traversal is passed to node D to check the value of the current coefficient against the value 4. If the current coefficient is less than or equal to 4, the traversal is passed to node E, and a "0" bit is added to the codeword. At node E, an equality test for the value "2" can be performed. If true, token 706, representing the value "2", is added to the bitstream (i.e., bit "0" is added to the codeword). Otherwise, at node F, the current coefficient is tested based on either the value "3" or the value "4", and if appropriate, token 710 (i.e., bit "0" is added to the codeword) or token 712 (i.e., bit "1" is added to the codeword) is added to the bitstream, and so on.

[0076] In short, a "0" bit is added to the codeword when traversing to the left child node, and a "1" bit is added to the codeword when traversing to the right child node. The decoder performs a similar process when decoding the codeword from the compressed bitstream. The decoder reads bits from the bitstream. If a bit is "1", it traverses the coefficient token tree to the right, and if a bit is "0", it traverses the tree to the left. The decoder then reads the next bit and repeats the process until the tree traversal reaches a leaf node (i.e., the token). As an example, to encode the token t[i] = THREE_TOKEN, the binary string 111010 is encoded starting from the root node (i.e., root node 701). As another example, decoding the codeword 11100 results in the token TWO_TOKEN.

[0077] Note that the correspondence between the "0" and "1" bits and the left and right child nodes is merely a convention used to describe the encoding and decoding process. In some implementations, for example, different conventions may be used, where "1" corresponds to the left child node and "0" corresponds to the right child node. The process described herein can be applied as long as the encoder and decoder use the same convention.

[0078] Since EOB_TOKEN is only possible after non-zero coefficients, when u[i-1] is zero (i.e., when the quantized transform coefficient at position i-1 of the 1D array u is equal to zero), the decoder can infer that the first bit must be 1. The first bit must be 1 because, when traversing the tree, for zero transform coefficients (e.g., in...),... Figure 6 The transform coefficients after the transform coefficients at position 1 of the zigzag scan sequence (e.g., in...) Figure 6 The transformation coefficients of position 2 in the zigzag scanning order must be used to move from the root node 701 to node 703 during traversal.

[0079] Therefore, the binary flag checkEob can be used to instruct the encoder and decoder to skip encoding and decoding the first bit in the coefficient token tree 700 starting from the root node. In practice, when the binary flag checkEob is zero (i.e., indicating that the root node should not be checked), the root node 701 of the coefficient token tree 700 is skipped, and node 703 becomes the first node of the coefficient token tree 700 to be traversed. That is, when the root node 701 is skipped, the encoder can skip encoding, and the decoder can skip decoding and infer the first bit of the encoded string (i.e., the binary bit "1").

[0080] When starting to encode or decode a block, the binary flag checkEob can be initialized to 1 (i.e., indicating that the root node should be checked). The following steps illustrate an example procedure for decoding quantized transform coefficients in an N×N block.

[0081] In step 1, the binary flag checkEob is set to zero (i.e., checkEob = 0), and the index i is also set to zero (i.e., i = 0).

[0082] In step 2, if the binary token checkEob equals 1, the token t[i] is decoded using (1) the complete coefficient token tree (i.e., starting from the root node 701 of the coefficient token tree 700); or if checkEob equals 0, the token t[i] is decoded using (2) the partial tree (e.g., starting from node 703), where EOB_TOKEN is skipped.

[0083] In step 3, if token t[i] = EOB_TOKEN, then all quantization transformation coefficients u[i]......u[N*N-1] are zero, and the decoding process terminates; otherwise, additional bits can be decoded if necessary (i.e., when t[i] is not equal to ZERO_TOKEN), and u[i] can be reconstructed.

[0084] In step 4, if u[i] equals zero, the binary flag checkEob is set to 1; otherwise, checkEob is set to 0. That is, checkEob can be set to the value (u[i] != 0).

[0085] In step 5, the index i is incremented (i.e., i = i + 1).

[0086] In step 6, steps 2 through 5 are repeated until all quantization transform coefficients have been decoded (i.e., until index i = N*N) or until EOB_TOKEN is decoded.

[0087] In step 2 above, decoding token t[i] may include the following steps: determining the context ctx, determining the binary probability distribution (i.e., the model) from the context ctx, and decoding the path from the root node to the leaf node of the coefficient token tree 700 using Boolean arithmetic codes by using the determined probability distribution. The context ctx can be determined using a context derivation method. The context derivation method can determine the context ctx using one or more of the block size, plane type (i.e., luminance or chrominance), position i, and previously decoded tokens t[0]...t[i-1]. Other criteria can be used to determine the context ctx. The binary probability distribution can be determined for any internal node of the coefficient token tree 700 starting from the root node 701 (when checkEOB=1) or starting from node 703 (when checkEOB=0).

[0088] In some codification systems, the probabilities used to encode or decode a token t[i] given a context ctx may be fixed and unsuitable for images (i.e., frames). For example, the probabilities may be default values ​​defined for a given context ctx, or they may be codified as part of the frame header of that frame (e.g., a signal). Codifying the probabilities for each context can be costly when codifying frames. Therefore, the encoder may analyze for each context whether it is advantageous to codify the associated probabilities of the context in the frame header and send its decision to the decoder using binary tokens. Furthermore, codifying the probabilities of a context can be cost-effectively reduced using predictions (e.g., in terms of bitrate), where the predictions may be derived from the probabilities of the same context in previously decoded frames.

[0089] In some encoding systems, in addition to traversing the coefficient token tree (such as coefficient token tree 700), each token can be associated with a encoded value to encode the transform coefficients. Therefore, instead of encoding binary codewords (i.e., selecting from an alphabet of codewords {0, 1}), a codeword alphabet comprising two or more codewords is used to encode the transform coefficients. In the example, the alphabet comprises 12 codewords: {EOB_TOKEN, ZERO_TOKEN, ONE_TOKEN, TWO_TOKEN, THREE_TOKEN, FOUR_TOKEN, DCT_VAL_CAT1, DCT_VAL_CAT2, DCT_VAL_CAT3, DCT_VAL_CAT4, DCT_VAL_CAT5, DCT_VAL_CAT6}. Thus, the alphabet used to encode the transform coefficients comprises 12 codewords, which are also called tokens. Other token alphabets that include more, fewer, or different tokens are possible. An alphabet that includes only the code elements {0, 1} is referred to herein as a binary alphabet. An alphabet that includes code elements other than {0, 1} and / or includes code elements containing {0, 1} is referred to herein as a non-binary alphabet. Each token can be associated with a value. In one example, the value of EOB_TOKEN can be 255. Each of the other tokens can be associated with a different value.

[0090] Figure 8 This is a schematic diagram of an example of a tree 800 for binarizing quantization transform coefficients according to an embodiment of the present disclosure. Tree 800 is a binary tree that can be used to binarize quantization transform coefficients in some video coding systems. Tree 800 can be used by a video coding system that encodes and decodes quantization transform coefficients using binarization, context modeling, and binary arithmetic coding steps. The process can be called context-adaptive binary arithmetic coding (CABAC). For example, to code a quantization transform coefficient x, the coding system can perform the following steps. The quantization transform coefficient x can be... Figure 6 Any coefficients of the quantization transform block 604 (e.g., coefficient 608).

[0091] In the binarization step, the coefficient x is first binarized into a binary string using tree 800. The binarization process can binarize the unsigned values ​​of the coefficient x. For example, binarizing the coefficient 628 (i.e., the value -1) will binarize the value 1. This results in a traversal of tree 800 and the generation of the binary string 10. Each bit of the binary string 10 is called a bin.

[0092] In the context derivation step, the context is derived for each bit to be coded. The context can be obtained from one or more of the following information: block size, plane type (i.e., luma or chroma), block position of coefficient x, and previously decoded coefficients (e.g., adjacent coefficients to the left and / or above, if available). Other information can be used to derive the context.

[0093] In the binary arithmetic coding step, given a context, bits are coded together with probability values ​​associated with the context using, for example, a binary arithmetic coding engine.

[0094] The process of encoding the transform coefficients may include a step called context update. In the context update step, after the bits are encoded, the probabilities associated with the context are updated to reflect the values ​​of the bits.

[0095] As briefly described above, entropy coding of a codeword sequence can be achieved by determining the probability p of the sequence using a probabilistic model. Binary arithmetic coding can then be used to map the sequence to binary codewords at the encoder and to decode the sequence from the binary codewords at the decoder. The length of the codeword or string (i.e., the number of bits) is given by equation (2) above. However, since the length is an integer, it is the smallest integer greater than the value calculated by equation (2). The efficiency of entropy coding is directly related to the probabilistic model.

[0096] In the following description, when referring to a sequence S consisting of N code elements, the subscript t refers to the code element located at position t in the sequence. For example, in the case where S is a sequence consisting of five (5) binary code elements (such as 11010), S5 refers to the code element located at the 5th position, such as the last 0 in the sequence 11010. Therefore, the sequence S can be represented as S1, S2...S N .

[0097] In some implementations, a symbol can refer to a token selected from a pool of non-binary tokens comprising two or more tokens. Therefore, a symbol (i.e., a token) can have a usable value. The token can be a token used for coding and indicates the transform coefficients. In this case, the "symbol sequence S" refers to tokens S1, S2...S3 that code the transform coefficients at scan positions 1, 2...N in scan order. N A list.

[0098] As used in this article, probability values, such as the current symbol S t probability It can have floating-point or fixed-point representations. Therefore, operations applied to these values ​​can use either floating-point arithmetic or fixed-point arithmetic.

[0099] Given the same symbol and Two estimated probabilities, so that... probability This leads to a probability that is not less than [a certain value]. The codeword. That is, a smaller probability usually produces a longer codeword compared to a larger probability.

[0100] The probability estimation model for a first-order linear system is largely derived from the following equation (3), which estimates the probability that the symbol at t+1 is 0 or 1 based on a weighted combination of the probability of the previous symbol at t and the conditional probability.

[0101]

[0102] This is partly based on the understanding that, (That is, a vector with two elements) is represented as a probability estimate of time t (i.e., t represents the index of the current symbol), equation exist and The value α is true if it represents the probability of the current symbol t being either 0 or 1. The value α can depend on the specific codec used for encoding and decoding operations. For example, the probabilistic model can come from the probability estimation module in the CABAC framework used in H.264 / AVC, as described in Section III.C of “Context-based adaptive binary arithmetic coding in the H.264 / AVC video compression standard” published by D. Marpe et al. in IEEE Transactions on Circuits and Systems for Video Technology, Vol. 13, No. 7, pp. 620–636 (2003). In this example, the value α is a constant close to 0.95. In another example, the probability model could come from the probability estimation module in AV1, such as the “AV1 bitstream & decoding process specification” published by P. de Rivaz and J. Haughton on page 182 (2018) of the Open Media Consortium, or the “An overview of core coding tools in the AV1 video codec” published by Y. Chen et al. on pages 41-45 (2018) of the 2018 Picture Coding Workshop (PCS) of IEEE. In this example, the probability update would use an adaptive α in terms of time (i.e., the current symbol index) and the number of symbols. In either case, there may be a barrier value P. barrier So that if or If it is too small (i.e., too close to 0 or 1, as indicated by the defined criteria), then the value α increases to P. barrier More simply, a P-barrier prevents the probability estimate from equaling 0. In some examples in this paper, P... barrier Called P 62 .

[0103] Equation (3) is considered as the update rule, which corresponds to the linear dynamic system used to predict sequence data. It is a first-order linear system, which can be more generally written as Equation (4) below, where the observation u of the stochastic system at time t is processed as input.

[0104] pt+1 =αp t +(1-α)u (4)

[0105] If another model is incorporated (e.g., to generate observation u), the probabilistic estimation model used for entropy coding can, as an alternative, correspond to a higher-order linear system that produces more accurate results (e.g., lower entropy). In one possible technique, the probabilistic model may include an update algorithm that uses conditions beyond those of the baseline probabilistic model in its update rules. For example, in addition to using… It is also possible to use τ code elements The conditional probability is estimated. In this estimation, a list can be used to apply multiple probability updates. In one possible technique, a weighted average of the model can be used to create a higher-order linear system. In another possible technique, the update rate can be adaptive, as described in more detail below.

[0106] Each of these techniques can be used alone or in combination for probability estimation, i.e., entropy coding of the symbol sequence. In the example described below, the symbol sequence input to the entropy coding and update algorithm can include a sequence s of N symbols. The sequence can be associated with any part of a frame (such as a frame, segment, slice, block, or some other part of a frame, such as about...). Figures 6 to 8 The binary correspondence of the data symbols described.

[0107] Figure 9 This is a flowchart of method 900 for entropy coding of symbol sequences according to the teachings in this paper. In 902, the symbol sequence is received. In the example in this paper, the sequence is a sequence s of N binary symbols, where s∈{0,1}. N The code is entropy-encoded. The next step is to select a symbol in 904. For example, the current symbol could be the first symbol in the sequence. In 906, the current symbol is entropy-encoded using a probability. In some implementations, the probability can be a probability determined by a first probability model or a baseline probability model among multiple probability models. In other implementations, the probability can be an updated probability that uses a combination of estimates of probabilities determined by using the corresponding probability model. In either case, the probability of the next symbol can be updated in 908. The probabilities of the baseline probability model and any other probability model can be updated, and a combination of these estimates can be used to update the probability in 908. This combination is a second-order linear system that is different from each first-order linear system represented by the model. This method 900 continues to check the remaining symbols in 910 and repeats until no remaining symbols are entropy-encoded.

[0108] The following describes method 900 with some examples. First, one implementation is described, in which a fixed probability estimate is used to estimate the probability of the entropy-coded symbol for updating the sequence. A second example using adaptive probability estimation follows the first.

[0109] The parameters or variables used for entropy coding and probability estimation are defined, initialized, or otherwise determined before, after, or simultaneously with the 902 received sequence. Because this example uses binary symbols, the probability values ​​can be initialized so that the probability of the current symbol being 0 or 1 is... The probability that the first symbol is 0 or 1 is set to be equal at the beginning of sequence s. In this example, multiple probability models may be available for probability estimation, although two are shown. The probability p is used for entropy coding of the current symbol, and the probability... It is a first probability estimate from the first probability model based on the following counts, and the probability This is a second probability estimate from the second probability model based on CABAC. The parameter `mode` is selected from a set including 0 and 1 (mode∈{0,1}). The parameter `mode` indicates which of the first or second probability models is the baseline model. In the example described in this paper, `mode = 0` so that the baseline model includes the CABAC model.

[0110] In this fixed-probability estimate, the weight w used to combine the probability estimate of the first probability model with the conditional probability is set to a value of 0.5; however, in other examples taught in this paper, it can be adaptive. Variables τ and t are set for reasons that will be described in further detail below. thres For binary entropy coding, the variable τ is set to 5, but τ can be equal to different values. For example, when performing multi-codeme entropy coding, τ can be set to 8. One use of the variable τ is to define the size L of a list used to store probability values, which is used to determine conditional probabilities. The entries for probability values ​​in the list are initialized as follows: List = [[0, 0]] L Where L = 2 τ As also described below, the variable t thres It can be set to 25, but it can be set to a lower or higher value.

[0111] The value α described in equations (3) and (4) can depend on the specific codec used for the encoding and decoding operations described above. In some examples, the value α can be a constant or can be adaptive in terms of time and number of symbols. In the example below, the value α is fixed so that α = (0.01875 / 0.5) 1 / 63 (Approximately equal to 0.95), which is consistent with the CABAC model. As mentioned above, the barrier value P barrier (also known as P) 62() can be used to limit p to its minimum value. In this example, P 62 =0.5α 62 .

[0112] The index (time) t is initialized to 1, indicating that processing begins with the first symbol s1 in the sequence. Processing of received symbols s1 continues while t is still less than or equal to the total number of symbols N. t , through p (which can also be described in this article as or ) for code element s t The code is then coded, and the probabilities are updated as described below. Index t is updated to move on to the next symbol s in the sequence. t+1 If so. Code element s t+1 Entropy coding is performed using the updated probability p. This process continues until all symbols in sequence s have been entropy-coded (i.e., entropy encoded or entropy decoded).

[0113] The pseudocode for this outer loop representing entropy coding and probability estimation is shown below.

[0114]

[0115] As can be seen from the pseudocode above, the function ProbUpdate is based on s t It is called after entropy coding. The function ProbUpdate receives the probability. probability Parameters τ, s t﹣τ to s t The value of the symbol within the range, the current symbol s t Index t, weight w, variable t thres The function ProbUpdate takes a list and the parameter mode as input. It returns the probability p and the probability... probability And the entries in the List. More generally, the function ProbUpdate updates the probability p to code the next symbol in the symbol sequence.

[0116] In the implementation of the teachings in this paper, the probability estimation update can include two probability estimation models—the previously described (and by) The CABAC model (represented by) and the maximum likelihood estimation (MLE) of independent identically distributed (iid) sequences based on counting (by) (Representation). For simplicity, the MLE of the iid sequence can be described using binary sequences. Assume s1...s tLet iidBernoulli, where the probability of 0 occurring is p, and p has no preference, i.e., the prior of p is U[0,1]. By observing the sequence, if 0 occurs k times and 1 occurs l times, then the estimator satisfies the following equation (5).

[0117]

[0118] With Estimated Probability The following equation (6) corresponds to this.

[0119]

[0120] These models, along with others in this paper, can be referred to as first-probability models, second-probability models, etc., to distinguish them without considering the execution order. Regardless of whether the pattern-indicating baseline model is a CABAC model or an MLE model, the update probability can be obtained by considering the conditional probability of previous symbols. The conditional probability p is determined by estimation. cond The estimation, List with storage of all possible context sequences t-1 :s t-τ Adjustable size 2 τ Used together. A List acts as a hash table of conditions to store conditional probabilities. When a symbol appears, its previous symbol τ is considered the context. The corresponding context in the list is then accessed, and the count is updated. The probability estimate is frequency. The baseline estimate (...) is maintained until the number of coded symbols is greater than τ (i.e., t > τ). or ) can be output as probability p.

[0121] When the corresponding list item has too few counts, the estimation may be inaccurate. There are at least two possible solutions. First, the condition has a length τ (as mentioned above, it varies with the number of symbols). When the list item has few counts, a shorter history of τ-1, τ-2, etc., can be considered. This involves taking the union of counts across multiple dictionary items. Whenever the count of this union reaches a threshold t... thres When this probability estimate is recorded, for example, this might lead to merging 00000 and 00001 into 0000. Second, if the total list is not large enough, a baseline estimate ( or ) can be output as probability p.

[0122] The mode input allows the user to decide whether to use the function ProbUpdateCABAC (corresponding to the CABAC model) or ProbUpdateCount (corresponding to the MLE model) to generate baseline probability estimates, and to take their average (because the weight w = 0.5) and conditional probability estimates (p cond This provides a stable version of the output. Taking the average is by no means trivial compared to changing the update rate (similar to α in CABAC). This is because averaging two fixed-rate update algorithms results in second-order linear dynamics that are fundamentally different from first-order updates.

[0123] That is, returning to the reference equation (4), the weighted average of the probability update can be considered as follows.

[0124] q t+1 =aq t +(1-a)u t

[0125] r t+1 =ar t +(1-b)u t

[0126] p t =wq t +(1-w)r t

[0127] Substituting into equation (4) and solving by eliminating q and r, we get the following equation (7), which is a second-order system covering CABAC when a = b = 0.95.

[0128] p t+1 =(a+b)p t -abp t-1 +(w(1-a)+(1-w)(1-b))u t +(ab-(1-w)a-wb)u t-1 (7)

[0129] This second-order system cannot be simply simplified to contain only p. t+1 p t and u t A first-order system.

[0130] The probability updates described above use a fixed (e.g., linear) combination of update algorithms based on context-based probability estimation. An example of the function `ProbUpdate` implementing the second-order system described above is shown in the pseudocode below. In short, when updating s via p in the outer loop... t When the ProbUpdate function is called after entropy coding, the probability estimation model available as the baseline model is used to generate the corresponding estimated probability.

[0131]

[0132]

[0133] In short, when p is used to access s in the outer loop... t After entropy coding, when the function ProbUpdate is called, the probability estimation model available as the baseline model is used to generate the corresponding estimated probability (in this example, it is...). and ), then, t is used to collect the count in the dictionary. tmp It is initialized to 0. Next, the algorithm counts and merges the probabilities in the dictionary above, where i represents each possible outcome of the random symbol, and this summation is performed within a condition window. The number of results already observed. This counting and merging ends in the second "end if". The next part of the code queries whether the dictionary is large enough (i.e., if t). tmp >0), and if large enough, the probability estimate is updated based on which baseline model to use, chosen according to the value of mode, so that the probability is updated based on one of two calculations. For example, if mode = 0, the updated probability p takes the value... If mode = 0, then the update probability p takes the value of Conversely, if the dictionary is not large enough (i.e., for t...), tmp If the response is >0 (no), then the baseline estimate is selected based on the value of mode used as the update probability. or

[0134] Then, return p. and List, so that p can be used for the next symbol s t Entropy coding, and It can be used for the next symbol s t After entropy coding, the baseline estimate is updated, and List can be used to update the next symbol s. t After entropy coding, conditional probabilities p can be optionally generated. cond .

[0135] The function ProbUpdateCABAC called by the function ProbUpdate above can be represented by the following pseudocode. This pseudocode represents the CABAC update above, where p is a vector of probability distribution. That is, [p(1-σ),p(σ)].

[0136]

[0137] The function ProbUpdateCount, called by the function ProbUpdate mentioned above, can be represented by the following pseudocode. This pseudocode represents the MLE calculation mentioned above, where p is the given result value s. t The vector of probability distribution

[0138]

[0139] Other update algorithms for context-based probability estimation are possible. For example, additional algorithms could include data-driven approaches describing the learning of linear combinations, instead of using the fixed combinations described above. The implementation of this entropy coding and adaptive probability estimation (compared to fixed probability estimation) will be described next.

[0140] In this implementation, instead of estimating conditional probabilities by using a list of previously constructed codewords to create a higher-order linear system, it is desirable to use different first-order linear models (CABAC, Counting (MLE), AV1, etc.) as kernels to output linear combinations through active learning of linear combinations. While these three models are used in this example, any probability estimation algorithm can be used. Let n p Represented as the number of kernels, Each row represents a probability estimate, and The weights / parameters are linear combinations. In other words, the weighted average of the simple (first-order) probability estimate is used as the result of entropy coding of the next symbol.

[0141]

[0142] Each line Both are updated using a probabilistic update algorithm, and p(l,:) is fixed as the AV1 output. In this way, the AV1 model / algorithm is related to w0=1, w i =0、 This corresponds to the situation at that time. This might be the initialization of the linear weights in the pseudocode described below. For this reason, the AV1 model can be called the baseline model.

[0143] Next, update w. This is because all update algorithms chosen as kernels should result in the output... With this improvement, w can be constrained so that w≥0. This also guarantees that the probability estimate is non-negative. Furthermore, 1 T w = 1 is used to ensure that the sum of the probabilities is 1. Stochastic gradient descent (SGD) is used to update w. For each s t The entropy is generated as follows.

[0144]

[0145] The gradient of w is taken as follows.

[0146]

[0147] At time t, use step size η t =η0 / t, this step size is the standard of SGD, allowing η t =η0 / t r Let r∈(0,1), and define r∈(1 / 2,1) as a random approximation. Then, update w with the following gradient step size.

[0148] Alternatively, a fixed step size η = η0 can be used to obtain the iteration. The inner loop parameters are then substituted into the final probability estimate w. t This satisfies This eliminates noise in SGD, allowing the use of specific decreasing step sizes or averaging of gradients. Iteration is also performed on variables. Linear dynamics is proposed as a faster update mechanism. This is the process illustrated in the pseudocode below.

[0149] To update the weights w, a constrained optimization step size can be included. Solving for this step size can be slow. To reduce the number of calls to the step size, a batch version of the algorithm can be used. In each epoch, batches of increasing size (1, 4, 9, 16...) are taken, and the average of the gradients in each batch is taken. Updates to w only occur at the end of each batch, with a fixed step size η0. Theoretically and empirically, the convergence rates of SGD and the batch version are similar.

[0150] Next, a fast algorithm for approximating the solution of the optimization problem is proposed. That is, the problem can be defined by the following equation.

[0151]

[0152] Simplifying the comments results in the following equation.

[0153]

[0154] Optimality can be obtained from the Lagrange operator according to the following equation.

[0155]

[0156] The Karush-Kuhn-Tucker (KKT) condition is represented by the following terms.

[0157]

[0158]

[0159] The optimal value of x is expressed by the following equation.

[0160]

[0161] Therefore, the following equations can be solved to obtain μ* and x* = max(yu*1,0).

[0162]

[0163] Note that the equation above is a one-dimensional convex optimization, which can be solved using binary search.

[0164] The data-driven method for learning linear combinations described above can be represented by the following pseudocode, where the input is the sequence of binary symbol symbols described in the previous implementation. As with the previous implementation, the first step is initialization. During initialization, the variable n... p Set it to equal 18, and the probability Set to equal to Furthermore, The variable α is based on This is used for initialization. Other variables are initialized as follows: η0 = 5, r = 1, b_ = b = 0, β = 0.95 and α min =0.84, r∈(1 / 2,1). The algorithm selects a pattern from SGD decreasing step size, SGD average parameter, SGD dynamic parameter, or SGD batch. It is worth noting that when the pattern is SGD decreasing step size, it can be solved using the fast projection optimization algorithm described above.

[0165] Similar to the fixed probability estimation described above, the parameters or variables used for entropy coding and probability estimation are defined, initialized, or otherwise determined before, after, or simultaneously with the received sequence in 902. The remaining steps of method 900 are then performed according to the following pseudocode, starting with the reception of the first symbol s1 and entropy coding of the first symbol s1. The probability estimates are then updated using the corresponding model. The functions ProbUpdateCount and ProbUpdateCABAC have been discussed above. The function ProbUpdateAV1 is described below. Once the probability estimates are updated, they are combined using the selected mode.

[0166]

[0167] The function ProbUpdateAV1 can be represented by the following pseudocode. This pseudocode represents the computation of AV1 above, where p is the given result value s. t The vector of probability distributions.

[0168]

[0169] Note that in this example, NumOfSyms (number of symbols) is 2, but it can be a higher number. Also note that α is used as input to ProbUpdateCABAC. Although it is a constant in these examples, this allows the value to be adaptive.

[0170] Below is a table showing the entropy generated for various binary sequences using different context-based probabilistic estimation techniques described in this paper. The table compares six techniques using nine different test sequences. The traditional CABAC and AV1 models / algorithms serve as baselines, against which the proposed different models / algorithms are compared. As can be seen from the leftmost column, the models used for comparison are SGD processing without SGD batch processing, SGD batch processing, the fixed-combination update algorithm with context-based probabilistic estimation described above using parameters / variables, and the fixed-combination update algorithm with context-based probabilistic estimation described above using parameters / variables, except that mode is set to 1 instead of 0. In most cases, the proposed algorithms outperform the baselines. The differences are generally related to the parameter p in CABAC. 62 Related. When using this parameter, an overly sparse dataset results in poor entropy.

[0171]

[0172] In video codification, the underlying probabilistic models of emitted symbols are often unknown and / or may be too complex to be fully described. Therefore, designing a good model for entropy codification in video codification can be a challenging problem. For example, a model effective for one sequence may not work well for another. The models used in this paper can be lossless (entropy) codification or parameters thereof. The model can be any parameter or method that influences the probability estimation for the purpose of entropy codification. For example, the model can define the probabilities used to encode and decode decisions at internal nodes in a token tree (such as the probabilities regarding...). Figure 7 (As described). In this case, by modifying the baseline model of the probability estimation described herein, the two-pass process of learning the probability of the current frame can be simplified to a single pass. In another example, the model can define some kind of context inference method. In this case, embodiments according to this disclosure can be used to combine probability estimates generated by multiple such methods. In yet another example, the model can define a completely new lossless coding algorithm.

[0173] The probabilistic update algorithm used for entropy coding described herein can incorporate the average of different models with fast and slow update rates. It can include count-based MLE estimators. Conditional probabilities and dictionary search can be selected. The implementation also allows for adaptive fusion of models.

[0174] For ease of illustration, the techniques described herein are each depicted and described as a series of boxes, steps, or operations. However, the boxes, steps, or operations according to this disclosure may occur in various orders and / or simultaneously. Additionally, other steps or operations not presented and described herein may be used. Furthermore, not all of the steps or operations shown may be necessary to implement the techniques according to the disclosed subject matter.

[0175] The foregoing aspects of encoding and decoding illustrate some examples of encoding and decoding techniques. However, it should be understood that encoding and decoding (as used in the claims) can refer to compression, decompression, transformation, or any other processing or alteration of data.

[0176] The word “example” is used herein to refer to something that serves as an example, instance, or illustration. Any aspect or design described herein as an “example” is not necessarily to be construed as being more preferred or advantageous than other aspects or designs. Rather, the use of the word “example” is intended to present concepts in a specific manner. As used in this application, the term “or” is intended to mean an inclusive “or” rather than an exclusive “or.” That is, unless otherwise stated or clearly apparent from the context, the statement “X comprises A or B” is intended to refer to any natural inclusion arrangement thereof. That is, if X comprises A, X comprises B, or X comprises A and B, then “X comprises A or B” is satisfied in any of the foregoing cases. Additionally, the articles “a” and “an” as used in this application and the appended claims should generally be interpreted as meaning “one or more” unless otherwise stated or clearly indicated by the context in the singular form. Furthermore, in this disclosure, the terms “implementation” or “an embodiment” are not intended to refer to the same embodiment unless so described.

[0177] Transmitting station 102 and / or receiving station 106 (and therefore the algorithms, methods, instructions, etc. stored thereon and / or executed thereon (including by encoder 400 and decoder 500)) can be implemented in hardware, software, or any combination thereof. For example, hardware may include a computer, intellectual property (IP) core, application-specific integrated circuit (ASIC), programmable logic array, optical processor, programmable logic controller, microcode, microcontroller, server, microprocessor, digital signal processor, or any other suitable circuit. In the claims, the term "processor" should be understood to include any of the aforementioned hardware, whether individually or in combination. The terms "signal" and "data" are used interchangeably. Furthermore, portions of transmitting station 102 and receiving station 106 are not necessarily implemented in the same manner.

[0178] Furthermore, in one aspect, for example, transmitting station 102 or receiving station 106 may be implemented using a general-purpose computer or general-purpose processor having a computer program that, when executed, performs any of the corresponding methods, algorithms, and / or instructions described herein. Additionally or alternatively, for example, a dedicated computer / processor may be utilized, which may contain additional hardware for performing any of the methods, algorithms, or instructions described herein.

[0179] For example, sending station 102 and receiver 106 can be implemented on a computer in a video conferencing system. Alternatively, sending station 102 can be implemented on a server, while receiver 106 can be implemented on a device separate from the server (such as a handheld communication device). In this case, using encoder 400, sending station 102 can encode content into an encoded video signal and send the encoded video signal to the communication device. In turn, the communication device can then decode the encoded video signal using decoder 500. Alternatively, the communication device can decode content stored locally on the communication device, such as content not sent by sending station 102. Other suitable implementations of sending station 102 and receiver 106 are also available. For example, receiver 106 can be a generally stationary personal computer instead of a portable communication device and / or the device including encoder 400 may also include decoder 500.

[0180] Furthermore, all or part of the embodiments of this disclosure may take the form of a computer program product accessible from, for example, a computer-usable or computer-readable medium. A computer-usable or computer-readable medium may be any device, for example, that may tangibly contain, store, deliver, or transmit a program for use by or in connection with any processor. For example, the medium may be an electronic, magnetic, optical, electromagnetic, or semiconductor device. Other suitable media are also available.

[0181] The above-described embodiments and other aspects have been provided to facilitate a good understanding of this disclosure, but without limiting it. Rather, this disclosure is intended to cover various modifications and equivalent arrangements included within the scope of the appended claims, which should be given the broadest interpretation to cover all such modifications and equivalent structures permitted by law.

Claims

1. A method for entropy coding of a sequence of symbols, comprising: A first probability model is determined for entropy coding of the sequence, the first probability model being one of several available probability models; At least one symbol of the sequence is entropy-coded using the probability determined by the first probability model; After entropy coding of at least one symbol of the sequence, a first probability estimate is determined using the first probability model to update the probability; A second probability estimate is determined using a second probability model to update the probability, wherein the second probability model is a model different from the first probability model; and The probability, updated by a combination of the first probability estimate and the second probability estimate, is used to entropy code subsequent symbols relative to the at least one symbol of the sequence.

2. The method of claim 1, wherein, The first probabilistic model includes a context-adaptive binary arithmetic coding (CABAC) model or an AV1 model.

3. The method of claim 1, wherein, The first probability model includes the maximum likelihood estimate of the Bernoulli distribution.

4. The method of claim 1, wherein, The at least one code element includes multiple code elements that reach the minimum number of code elements.

5. The method according to any one of claims 1 to 4, further comprising: The combination is formed as a linear combination of the first probability estimate and the second probability estimate.

6. The method according to any one of claims 1 to 4, wherein: The combination is a weighted combination of the first probability estimate and the second probability estimate.

7. The method of claim 6, wherein, The weighted combination uses fixed weights.

8. The method of claim 6, wherein, The weighted combination uses variable weights.

9. The method according to any one of claims 1 to 4, further comprising: A third probability estimate is determined using a third probability model to update the probability, wherein the combination includes a combination of the first probability estimate, the second probability estimate, and the third probability estimate, and the third probability model is a model different from each of the first probability model and the second probability model.

10. The method of claim 9, wherein, The combination of the first probability estimate, the second probability estimate, and the third probability estimate is a linear combination using a weighted average of the first probability estimate, the second probability estimate, and the third probability estimate.

11. The method according to claim 10, wherein, The weights used for the weighted average are updated using stochastic gradient descent (SGD).

12. The method according to claim 11, wherein, The first probability model includes decreasing step size SGD, average parameter SGD, dynamic parameter SGD, or batch SGD.

13. The method according to claim 1, wherein, The at least one symbol includes a first symbol, and the method includes: Entropy coding is performed on each symbol after the first symbol using the probability used to entropy code the previous symbol updated using a combination of the first probability estimate and the second probability estimate.

14. The method according to claim 13, wherein, The combination uses an adaptive weighting of the first probability estimate and the second probability estimate.

15. An apparatus comprising a processor configured to perform the method according to any one of claims 1 to 14.