Video processing method and apparatus, and medium

By using a deep Q-network method based on reinforcement learning, the problem of finding the optimal bitrate control scheme in existing video coding technologies is solved, achieving more efficient video coding and bandwidth utilization, and simplifying the video coding process.

WO2026138991A1PCT designated stage Publication Date: 2026-07-02DOUYIN VISION CO LTD +2

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
DOUYIN VISION CO LTD
Filing Date
2025-12-25
Publication Date
2026-07-02

Smart Images

  • Figure CN2025145762_02072026_PF_FP_ABST
    Figure CN2025145762_02072026_PF_FP_ABST
Patent Text Reader

Abstract

Provided in the embodiments of the present disclosure is a video processing solution. Also provided is a video processing method. The method comprises: for a video unit of a video and bitstream conversion of the video, determining a first set of bit rate control parameters for the video unit on the basis of a perceptron in an entity; determining a third set of bit rate control parameters on the basis of the first set of bit rate control parameters and a second set of bit rate control parameters, wherein the second set of bit rate control parameters is an optimal set of bit rate control parameters during the training of the entity; and executing conversion on the basis of the third set of bit rate control parameters.
Need to check novelty before this filing date? Find Prior Art

Description

Methods, apparatus and media for video processing Technical Field

[0001] The embodiments of this disclosure generally relate to video processing techniques, and more specifically, to a search method for a near-optimal solution for bitrate control based on reinforcement learning. Background Technology

[0002] Today, digital video capabilities are being applied to all aspects of people's lives. Various video compression technologies have been proposed for video encoding / decoding, such as MPEG-2, MPEG-4, ITU-TH.263, ITU-TH.264 / MPEG-4 Part 10 Advanced Video Codec (AVC), ITU-TH.265 High Efficiency Video Codec (HEVC) standard, and Multi-Functional Video Codec (VVC) standard. However, conventional video codecs have several problems that are undesirable. Therefore, there is a general expectation to further improve the encoding and decoding gain of conventional video codec technologies. Summary of the Invention

[0003] Embodiments of this disclosure provide a solution for video processing.

[0004] In a first aspect, a method for video processing is proposed. This method includes: for video units and bitstream conversion of the video, determining a first set of bitrate control parameters for the video unit based on a perceptron in an entity; determining a third set of bitrate control parameters based on the first set and a second set of bitrate control parameters, wherein the second set of bitrate control parameters is the optimal set of bitrate control parameters obtained during the training of the entity; and performing the conversion based on the third set of bitrate control parameters. In this manner, the concept of a Deep Q Network is involved, thereby enabling the finding of a near-optimal bitrate control scheme in a shorter time.

[0005] In a second aspect, an apparatus for video processing is provided. The apparatus includes a processor and a non-transitory memory having instructions thereon. When executed by the processor, the instructions cause the processor to perform the method according to the first aspect of this disclosure.

[0006] In a third aspect, a non-transitory computer-readable storage medium is proposed. This non-transitory computer-readable storage medium stores instructions that cause a processor to execute the method according to the first aspect of this disclosure.

[0007] In a fourth aspect, another non-transitory computer-readable recording medium is proposed. This non-transitory computer-readable recording medium stores a bitstream of video generated by a method performed by a video processing apparatus. The method includes: determining a first set of bitrate control parameters for a video unit of the video based on a perceptron in an entity; determining a third set of bitrate control parameters based on the first set of bitrate control parameters and a second set of bitrate control parameters, wherein the second set of bitrate control parameters is the optimal set of bitrate control parameters during the training of the entity; and generating a bitstream of the video unit based on the third set of bitrate control parameters.

[0008] In a fifth aspect, a method for storing a bitstream of video is proposed. The method includes: determining a first set of bitrate control parameters for a video unit of the video based on a perceptron in an entity; determining a third set of bitrate control parameters based on the first set and a second set of bitrate control parameters, wherein the second set of bitrate control parameters is the optimal set of bitrate control parameters obtained during the training of the entity; generating a bitstream of the video unit based on the third set of bitrate control parameters; and storing the bitstream in a non-transitory computer-readable recording medium.

[0009] This summary aims to present, in a simplified form, the selected concepts further described below in the detailed embodiments. This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to limit the scope of the claimed subject matter. Attached Figure Description

[0010] The above and other objects, features, and advantages of exemplary embodiments of the present disclosure will become more apparent from the following detailed description with reference to the accompanying drawings. In the exemplary embodiments of the present disclosure, the same reference numerals generally refer to the same components.

[0011] Figure 1 shows a block diagram of an example video codec system according to some embodiments of the present disclosure;

[0012] Figure 2 shows a block diagram of a first example video encoder according to some embodiments of the present disclosure;

[0013] Figure 3 shows a block diagram of an example video decoder according to some embodiments of the present disclosure;

[0014] Figure 4 illustrates an algorithm framework diagram according to some embodiments of the present disclosure;

[0015] Figure 5 illustrates a flowchart of a method for video processing according to an embodiment of the present disclosure; and

[0016] Figure 6 shows a block diagram of a computing device in which various embodiments of the present disclosure may be implemented.

[0017] In all the accompanying drawings, the same or similar reference numerals usually refer to the same or similar elements. Detailed Implementation

[0018] The principles of this disclosure will now be described with reference to some embodiments. It should be understood that these embodiments are described for illustrative purposes only and to help those skilled in the art understand and implement this disclosure, and do not imply any limitation on the scope of this disclosure. In addition to the methods described below, the disclosure described herein can be implemented in various other ways.

[0019] In the following description and claims, unless otherwise defined, all scientific and technical terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains.

[0020] The terms "an embodiment," "embodiment," "example embodiment," etc., used in this disclosure refer to embodiments that may include specific features, structures, or characteristics, but not every embodiment is required to include that specific feature, structure, or characteristic. Furthermore, these phrases do not necessarily refer to the same embodiment. Additionally, when a specific feature, structure, or characteristic is described in conjunction with an example embodiment, whether explicitly described or not, it is believed that such a feature, structure, or characteristic affecting its relation to other embodiments is within the knowledge of those skilled in the art.

[0021] It should be understood that although the terms “first” and “second”, etc., can be used to describe various elements, these elements should not be limited to these terms. These terms are used only to distinguish one element from another. For example, a first element may be referred to as a second element, and similarly, a second element may be referred to as a first element, without departing from the scope of the exemplary embodiments. As used herein, the term “and / or” includes any and all combinations of one or more of the listed terms.

[0022] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the exemplary embodiments. As used herein, the singular forms “a,” “an,” and “the” are also intended to include the plural forms unless the context clearly indicates otherwise. It should also be understood that the terms “comprising,” “including,” and / or “having” as used herein indicate the presence of the stated features, elements, and / or components, but do not exclude the presence or addition of one or more other features, elements, components, and / or combinations thereof. Example Environment

[0023] Figure 1 is a block diagram illustrating an example video encoding / decoding system 100 that can utilize the techniques of this disclosure. As shown, the video encoding / decoding system 100 may include a source device 110 and a destination device 120. The source device 110 may also be referred to as a video encoding device, and the destination device 120 may also be referred to as a video decoding device. In operation, the source device 110 may be configured to generate encoded video data, and the destination device 120 may be configured to decode the encoded video data generated by the source device 110. The source device 110 may include a video source 112, a video encoder 114, and an input / output (I / O) interface 116.

[0024] Video source 112 may include sources such as video capture devices. Examples of video capture devices include, but are not limited to, interfaces for receiving video data from video content providers, computer graphics systems for generating video data, and / or combinations thereof.

[0025] Video data may include one or more images. Video encoder 114 encodes the video data from video source 112 to generate a bitstream. The bitstream may include a sequence of bits forming a codec representation of the video data. The bitstream may include codec images and associated data. The codec images are codec representations of images. The associated data may include sequence parameter sets, image parameter sets, and other syntax structures. I / O interface 116 may include a modulator / demodulator and / or a transmitter. Encoded video data can be directly transmitted to destination device 120 via network 130A through I / O interface 116. Encoded video data may also be stored on storage medium / server 130B for access by destination device 120.

[0026] The destination device 120 may include an I / O interface 126, a video decoder 124, and a display device 122. The I / O interface 126 may include a receiver and / or a modem. The I / O interface 126 may acquire encoded video data from the source device 110 or the storage medium / server 130B. The video decoder 124 may decode the encoded video data. The display device 122 may display the decoded video data to a user. The display device 122 may be integrated with the destination device 120, or it may be external to the destination device 120, which is configured to interface with an external display device.

[0027] The video encoder 114 and the video decoder 124 can operate according to video compression standards, such as the High Efficiency Video Codec (HEVC) standard, the Multi-Functional Video Codec (VVC) standard, and other existing and / or further standards.

[0028] Figure 2 is a block diagram illustrating an example of a video encoder 200 according to some embodiments of the present disclosure. The video encoder 200 may be an example of the video encoder 114 in the system 100 shown in Figure 1.

[0029] The video encoder 200 can be configured to implement any or all of the technologies disclosed herein. In the example of Figure 2, the video encoder 200 includes multiple functional components. The technologies described in this disclosure can be shared among the various components of the video encoder 200. In some examples, the processor can be configured to perform any or all of the technologies described in this disclosure.

[0030] In some embodiments, the video encoder 200 may include a segmentation unit 201, a prediction unit 202, a residual generation unit 207, a transform unit 208, a quantization unit 209, an inverse quantization unit 210, an inverse transform unit 211, a reconstruction unit 212, a buffer 213, and an entropy coding unit 214. The prediction unit 202 may include a mode selection unit 203, a motion estimation unit 204, a motion compensation unit 205, and an intra-frame prediction unit 206.

[0031] In other examples, the video encoder 200 may include more, fewer, or different functional components. In one example, the prediction unit 202 may include an intra-block copy (IBC) unit. The IBC unit can perform prediction in an IBC mode, where at least one reference picture is the picture in which the current video block is located.

[0032] Furthermore, although some components (such as motion estimation unit 204 and motion compensation unit 205) can be integrated, for illustrative purposes, these components are shown separately in the example of Figure 2.

[0033] The segmentation unit 201 can segment an image into one or more video blocks. The video encoder 200 and the video decoder 300 can support various video block sizes.

[0034] The mode selection unit 203 can select one of several codec modes (intra-frame codec or inter-frame codec) based, for example, on the error result, and provide the resulting intra-frame or inter-frame codec block to the residual generation unit 207 to generate residual block data, and to the reconstruction unit 212 to reconstruct the coded block for use as a reference image. In some examples, the mode selection unit 203 can select an intra-frame / inter-frame joint prediction (CIIP) mode, where prediction is based on inter-frame prediction signals and intra-frame prediction signals. In the case of inter-frame prediction, the mode selection unit 203 can also select a resolution for the block based on the motion vector (e.g., sub-pixel precision or integer pixel precision).

[0035] To perform inter-frame prediction on the current video block, motion estimation unit 204 can generate motion information for the current video block by comparing one or more reference frames from buffer 213 with the current video block. Motion compensation unit 205 can determine the predicted video block for the current video block based on the motion information and decoded samples of images from buffer 213 other than the image associated with the current video block.

[0036] The motion estimation unit 204 and the motion compensation unit 205 can perform different operations on the current video block, for example, depending on whether the current video block is in an I-strip, P-strip, or B-strip. As used herein, an "I-strip" can refer to a portion of an image composed of macroblocks, all of which are based on macroblocks within the same image. Furthermore, as used herein, in some aspects, "P-strip" and "B-strip" can refer to portions of an image composed of macroblocks that do not depend on macroblocks within the same image.

[0037] In some examples, motion estimation unit 204 can perform unidirectional prediction on the current video block, and can search reference images in list 0 or list 1 to find a reference video block for the current video block. Motion estimation unit 204 can then generate a reference index indicating the reference image containing the reference video block in list 0 or list 1, and a motion vector indicating the spatial displacement between the current video block and the reference video block. Motion estimation unit 204 can output the reference index, prediction direction indicator, and motion vector as motion information for the current video block. Motion compensation unit 205 can generate a predicted video block for the current video block based on the reference video block indicated by the motion information of the current video block.

[0038] Alternatively, in other examples, motion estimation unit 204 can perform bidirectional prediction on the current video block. Motion estimation unit 204 can search for reference images in list 0 to find a reference video block for the current video block, and can also search for reference images in list 1 to find another reference video block for the current video block. Motion estimation unit 204 can then generate reference indices indicating the reference images containing the reference video blocks in lists 0 and 1, and motion vectors indicating the spatial displacement between the reference video blocks and the current video block. Motion estimation unit 204 can output the reference index and motion vector of the current video block as motion information for the current video block. Motion compensation unit 205 can generate a predicted video block for the current video block based on the reference video blocks indicated by the motion information of the current video block.

[0039] In some examples, the motion estimation unit 204 can output a complete set of motion information for use in the decoder's decoding process. Alternatively, in some embodiments, the motion estimation unit 204 can reference the motion information of another video block to transmit the motion information of the current video block via a signal. For example, the motion estimation unit 204 can determine that the motion information of the current video block is sufficiently similar to the motion information of neighboring video blocks.

[0040] In one example, the motion estimation unit 204 may indicate a value to the video decoder 300 in the syntax structure associated with the current video block, which indicates that the current video block has the same motion information as another video block.

[0041] In another example, motion estimation unit 204 may identify another video block and motion vector difference (MVD) in the syntax structure associated with the current video block. The motion vector difference indicates the difference between the motion vector of the current video block and the motion vector of the indicated video block. Video decoder 300 can use the motion vector of the indicated video block and the motion vector difference to determine the motion vector of the current video block.

[0042] As discussed above, the video encoder 200 can transmit motion vectors via signals in a predictive manner. Two examples of predictive signaling techniques that can be implemented by the video encoder 200 include Advanced Motion Vector Prediction (AMVP) and Merge Pattern Signaling.

[0043] Intra-prediction unit 206 can perform intra-prediction on the current video block. When intra-prediction unit 206 performs intra-prediction on the current video block, it can generate prediction data for the current video block based on decoded samples from other video blocks in the same frame. The prediction data for the current video block can include the predicted video block and various syntax elements.

[0044] The residual generation unit 207 can generate residual data for the current video block by subtracting (e.g., indicated by a minus sign) multiple predicted video blocks from the current video block. The residual data for the current video block can include residual video blocks corresponding to different sample components of the samples in the current video block.

[0045] In other examples, such as in skip mode, there may be no residual data for the current video block, and the residual generation unit 207 may not perform subtraction operations.

[0046] The transform processing unit 208 can generate one or more transform coefficient video blocks for the current video block by applying one or more transforms to the residual video blocks associated with the current video block.

[0047] After the transform processing unit 208 generates a transform coefficient video block associated with the current video block, the quantization unit 209 can quantize the transform coefficient video block associated with the current video block based on one or more quantization parameter (QP) values ​​associated with the current video block.

[0048] The inverse quantization unit 210 and the inverse transform unit 211 can apply inverse quantization and inverse transform to the transform coefficient video block respectively to reconstruct the residual video block from the transform coefficient video block. The reconstruction unit 212 can add the reconstructed residual video block to the corresponding samples from one or more predicted video blocks generated by the prediction unit 202 to generate a reconstructed video block associated with the current video block for storage in the buffer 213.

[0049] After the video block is reconstructed by reconstruction unit 212, a loop filtering operation can be performed to reduce video block artifacts in the video block.

[0050] Entropy encoding unit 214 can receive data from other functional components of video encoder 200. When entropy encoding unit 214 receives data, it can perform one or more entropy encoding operations to generate entropy-encoded data and output a bitstream including the entropy-encoded data.

[0051] Figure 3 is a block diagram illustrating an example of a video decoder 300 according to some embodiments of the present disclosure. The video decoder 300 may be an example of the video decoder 124 in the system 100 shown in Figure 1.

[0052] The video decoder 300 can be configured to perform any or all of the techniques disclosed herein. In the example of Figure 3, the video decoder 300 includes multiple functional components. The techniques described in this disclosure can be shared among the various components of the video decoder 300. In some examples, the processor can be configured to perform any or all of the techniques described in this disclosure.

[0053] In the example of Figure 3, the video decoder 300 includes an entropy decoding unit 301, a motion compensation unit 302, an intra-frame prediction unit 303, an inverse quantization unit 304, an inverse transform unit 305, a reconstruction unit 306, and a buffer 307. In some examples, the video decoder 300 can perform a decoding process that is generally contrasted with the encoding process described with respect to the video encoder 200.

[0054] Entropy decoding unit 301 can retrieve the encoded bitstream. The encoded bitstream may include entropy-encoded video data (e.g., encoded video data blocks). Entropy decoding unit 301 can decode the entropy-encoded video data, and motion compensation unit 302 can determine motion information from the entropy-decoded video data, which includes motion vectors, motion vector precision, reference picture list indices, and other motion information. Motion compensation unit 302 can determine this information, for example, by performing AMVP and Merge mode. AMVP is used, which involves deriving several most likely candidates based on data from adjacent PBs and reference pictures. Motion information typically includes horizontal and vertical motion vector displacement values, one or two reference picture indices, and, in the case of a prediction region in a B-strip, an identifier of which reference picture list is associated with each index. As used herein, in some aspects, "Merge mode" may refer to deriving motion information from spatially or temporally adjacent blocks.

[0055] The motion compensation unit 302 can generate motion compensation blocks and can perform interpolation based on an interpolation filter. The identifier of the interpolation filter to be used, with sub-pixel accuracy, can be included in the syntax element.

[0056] The motion compensation unit 302 can use the interpolation filter used by the video encoder 200 during the encoding of the video block to calculate the interpolation for sub-integer pixels of the reference block. The motion compensation unit 302 can determine the interpolation filter used by the video encoder 200 based on the received syntax information, and the motion compensation unit 302 can use the interpolation filter to generate the prediction block.

[0057] Motion compensation unit 302 may use at least some of the syntax information to determine the block size of the frames(multiple) and / or stripes(multiple) used to encode the encoded video sequence, segmentation information describing how each macroblock of the image of the encoded video sequence is segmented, a pattern indicating how each segment is encoded, one or more reference frames (and a list of reference frames) for each inter-frame coded block, and other information for decoding the encoded video sequence. As used herein, in some aspects, a “strip” can refer to a data structure that can be decoded independently of other stripes of the same image in terms of entropy encoding / decoding, signal prediction, and residual signal reconstruction. A strip can be the entire image or a region of the image.

[0058] Intra-prediction unit 303 can use, for example, an intra-prediction mode received in the bitstream to form prediction blocks from spatially adjacent blocks. Dequantization unit 304 dequantizes (i.e., de-quantizes) the quantized video block coefficients provided in the bitstream and decoded by entropy decoding unit 301. Inverse transform unit 305 applies the inverse transform.

[0059] The reconstruction unit 306 can obtain the decoded block, for example, by adding the residual block to the corresponding predicted block generated by the motion compensation unit 302 or the intra-frame prediction unit 303. If necessary, a deblocking filter can also be used to filter the decoded block to remove block artifacts. The decoded video block is then stored in a buffer 307, which provides a reference block for subsequent motion compensation / intra-frame prediction and also generates decoded video for presentation on a display device.

[0060] Some exemplary embodiments of this disclosure will be described in detail below. It should be understood that section headings are used in this document for ease of understanding and not to limit the embodiments disclosed in a section to that section only. Furthermore, while some embodiments are described with reference to multi-functional video codecs or other specific video codecs, the disclosed techniques are also applicable to other video codec techniques. Additionally, although some embodiments describe video encoding and decoding steps in detail, it will be understood that the decoder will perform decoding of the corresponding steps to undo the encoding and decoding. Furthermore, the term "video processing" includes video encoding and decoding or compression, video decoding or decompression, and video transcoding, wherein video pixels are represented from one compression format to another or at different compression bitrates.

[0061] As used herein, the term “video unit” or “video block” can be a sequence, picture, strip, slice, brick, subpicture, codec tree unit (CTU) / codec tree block (CTB), CTU / CTB row, one or more codec units (CU) / codec blocks (CB), one or more CTU / CTB, one or more virtual pipeline data units (VPDU), or a sub-region within a picture / strip / slice / brick.

[0062] As used in this paper, the term "agent" can refer to an entity in reinforcement learning (RL) that performs actions and receives feedback from the environment. Its goal is to learn how to choose the optimal action through interaction with the environment to maximize long-term reward.

[0063] The term "environment" as used in this paper can refer to the external system in which an agent interacts and learns in reinforcement learning. It defines the actions that the agent can perform, the effects of these actions on the environment, and the resulting feedback (rewards and states).

[0064] As used in this paper, the term "action" can refer to the behavior or operation that an agent chooses in a given state during reinforcement learning. Agents interact with their environment by performing actions and update their policies based on feedback from the environment (rewards and new states) to maximize long-term rewards.

[0065] As used in this paper, the term "state" can be a description or representation of the environment at a given moment in reinforcement learning. It captures all the important information about the environment, enabling the agent to make decisions based on this information. The state is the basis for the agent's interaction with the environment, and the agent's task is to choose the appropriate action based on the current state to maximize long-term rewards.

[0066] As used in this paper, the term "reward" can refer to the feedback signal given by the environment after an agent performs an action in reinforcement learning. The reward can be a scalar value used to measure the "goodness" or "usefulness" of the action in the current state. Agents evaluate their behavior through rewards and optimize their policies by maximizing cumulative rewards.

[0067] As used in this paper, the term "action-value function" refers to the function used in reinforcement learning to evaluate the expected cumulative reward that can be obtained after taking an action in a specific state. The action-value function helps the agent determine whether performing a particular action in a given state is good or bad, thereby guiding its decision-making.

[0068] In video encoding and transmission systems, rate control is a key technology for ensuring a balance between video quality and bandwidth utilization. Its main goal is to control the bitrate of the video stream within a given bandwidth, guaranteeing visual quality while avoiding excessive bandwidth waste. The solution involves specifying encoding parameters for each frame of the input video.

[0069] As used in this article, the term "queue" is a common data structure that follows the First In First Out (FIFO) principle, meaning that the element that entered the queue first is the first to be removed. It can be viewed as a linear structure where insertion occurs at the tail of the queue, and removal occurs at the head. Queues have wide applications in many computer science and practical fields, particularly in task scheduling, resource management, and message passing.

[0070] In recent years, numerous research works have proposed relatively accurate and high-performance video bitrate control methods. For example, some works have focused on establishing the relationship between bitrate and quantization parameters, thereby establishing bitrate control methods based on this relationship. Besides these works based on manually designed models, some have attempted to apply reinforcement learning to bitrate control methods. While these works have made progress in bitrate control, they haven't explored the optimal bitrate control scheme for each video, nor have they investigated the upper limit of bitrate control algorithms. In fact, finding the optimal bitrate control scheme for each video is an extremely difficult task. This is because the coupling relationships between encoded images in video encoding are very complex, making it impossible to obtain an analytical solution through direct modeling. Specifically, there are two main reasons: 1) Video encoding employs inter-frame prediction technology, resulting in complex image reference relationships such as bidirectional and multi-reference images; 2) Due to the use of rate-distortion optimization technology, complex decisions are made regarding different prediction methods such as intra-frame prediction and inter-frame prediction. Due to the limitations of manually designed models, it is difficult for works based on them to find the optimal bitrate control scheme; and for reinforcement learning-based methods, a single agent also struggles to find the optimal bitrate control scheme when faced with diverse video content and complex image relationships.

[0071] Finding the optimal bitrate control scheme for each video is crucial. The optimal scheme can be used to optimize video bitrate control methods and also to provide labels for supervised learning, enabling the training of deep learning-based bitrate control methods. However, as the analysis above shows, finding the optimal scheme is extremely difficult, often requiring brute-force search. To obtain the optimal scheme, brute-force search involves a huge search space and has a time complexity of O(n^p), where p represents the number of bitrate control parameters. As the number of video frames and bitrate control parameters increases, finding the optimal bitrate control scheme in a finite amount of time becomes impossible.

[0072] To at least partially address the aforementioned problems, embodiments of this disclosure propose a reinforcement learning-based search method. For example, in some embodiments, the bitrate control problem for each video is first modeled as a Markov decision process, and a reward function is formulated based on the quality of the encoded image and the bitrate error. Then, utilizing the established Markov decision process for the bitrate control problem, a search method based on a Deep Q Network is developed, comprising two steps: exploration and exploitation. During exploration, an agent is trained according to the Markov decision process, and high-performing bitrate control schemes are recorded throughout the training process. Subsequently, in the exploitation step, the trained agent estimates the bitrate control scheme. Finally, the approximate optimal bitrate control scheme for the input video is determined by exploring and exploiting the obtained bitrate control schemes. Through embodiments of this disclosure, the time complexity of the search is controllable, i.e., O(mn), where m represents the training period. Furthermore, the bitrate control schemes according to the embodiments of this disclosure have bitrate errors and encoding quality results that are very close to the optimal scheme, and compared with brute-force search, the embodiments of this disclosure require less time, thereby finding an approximately optimal bitrate control scheme for each video within a limited time.

[0073] Figure 4 illustrates an example diagram of an algorithm framework 400 according to some embodiments of the present disclosure. As shown in Figure 4, the algorithm framework 400 includes an exploration process 410 and an exploitation process 420.

[0074] The primary objective of exploration process 410 is to train an entity (e.g., an agent) to learn how to formulate a bitrate control scheme for the input video. As shown in Figure 4, entity 411 includes an action-value function, which can be represented as Q. π (S i A i ,θ), where S i Represents the state parameter, A i θ represents the quantization parameter, and θ represents the weight. In some embodiments, as shown in Figure 4, the action value function can be constructed by a perceptron 412. For example, the action value function can be constructed by a 4-layer perceptron. In some embodiments, the input dimension of the perceptron 412 can be any suitable integer, such as 4. In some embodiments, the output dimension of the perceptron 412 is the range of quantization parameters encoding the image. In some embodiments, the hidden layer dimension of the perceptron 412 is 512. In some embodiments, the perceptron 412 can be implemented by a deep Q-network. As used herein, the term "perceptron" refers to a linear binary classification model, which is the basis of neural networks and support vector machines, and is one of the fundamental classification algorithms in machine learning. The terms "perceptron," "perceptron," and "multilayer perceptron" are used interchangeably herein.

[0075] In some embodiments, entity 411 may further include relay memory 413. In other embodiments, relay memory 413 may be a separate storage device. Relay memory 413 may store training samples obtained by the interaction between entity 411 and environment 414. For example, to train the perceptron 412 in the agent, entity 411 interacts with the training environment 414 to obtain training data, which is stored in the relay memory. Then, a certain amount of training data is randomly selected from the relay memory 413 to train the perceptron 412.

[0076] As shown in Figure 4, the training environment 414 has a video encoder 415. Video 416 is the input video to the video encoder 415 for training entity 411. For example, video 416 may have N images, which can be represented as {P1'…P...} N In some embodiments, the input to the training agent 411 further includes one or more of the following: target bit rate (BR). t The total training period of entity 411 (E) n ), playback memory capacity of entity 411 (N) m The amount of training data (BS) for the entity, or the update cycle (E) of the target network. u ).

[0077] In some embodiments, the perceptron can be initialized. For example, the perceptron 412Q can be initialized based on random weights θ. π (S i A i Additionally or alternatively, the target network can be initialized using weights θ. For example, the target network Q′ can be initialized using weights θ. π (S i+1 A i ,θ′), using playback memory capacity N m And queue initialization replay memory M.

[0078] In some embodiments, the state parameters can be determined based on the bit consumption of video 416, the compression quality of the encoded images, and the number of images to be encoded. For example, the state parameters can be determined as follows: Where S i Represents the state parameter, BR t The target video bitrate is represented by FPS, the video frame rate is represented by FPS, N represents the number of images in the video, and Q represents the number of frames in the video. i B represents the quality of the encoded image. j This represents the number of bits consumed by the encoded j-th image, where i and j are integers.

[0079] In some embodiments, the quantization parameters can be determined based on state parameters and an initialized perceptron. For example, S can be... i Enter Q π (S i A i ,θ) to obtain the quantization parameter A i In some embodiments, the state parameter can be fed into an initialized perceptron 412 to obtain a list of action values. This list of action values ​​may include one or more quantification parameters and a corresponding reward / reward (from state S). i (Initial cumulative reward). In some embodiments, the quantization parameter can be determined based on a list of action values ​​and a search strategy. For example, probability values ​​can be sampled from a uniform distribution. In this case, if the probability value is greater than a probability threshold, the quantization parameter with the highest reward in the action value list is determined as the quantization parameter. If the probability value is less than the probability threshold, the quantization parameter is randomly determined from the action value list. As an example only, the search strategy is to sample a probability value from a uniform distribution; if the probability value is greater than 0.5, the action corresponding to the maximum value in the action value list is selected, which is the quantization parameter; if the probability value is less than 0.5, a quantization parameter is randomly selected.

[0080] In some embodiments, the next image to be encoded can be encoded based on quantization parameters. For example, the selected quantization parameter A... i The video encoder 415 input into environment 413 is used to encode the next image to be encoded.

[0081] In some embodiments, the reward for the quantization parameters and the next state parameters can be determined based on the coding quality and bitrate error. For example, the reward can be determined as: R_{i}=\left\{\begin{aligned}&\frac{Q_{i}}{55.0\times N}-\frac{|BR_{t}-BR_{e}|}{BR_{t}}\times 100,&last\picture\\&\frac{Q_{i}}{55.0\times N},&others\end{aligned}\right(2). In some embodiments, the next state S can be determined based on formula (1). i+1 .

[0082] In some embodiments, a set of training data, including state parameters, quantization parameters, reward, and next state parameters, is stored in the playback memory 413 of entity 411. For example, a set of training data (S) can be stored in the playback memory 413 of entity 411. i A i R i S i+1Store it in playback memory 413.

[0083] In some embodiments, if the training data in the replay memory 413 exceeds a data quantity threshold, the initialized perceptron can be updated based on a portion of the training data. For example, when the amount of training data in the replay memory 413 exceeds a defined training quantity, a defined training quantity of training data is randomly selected to train the perceptron 412. In some embodiments, the backpropagation method can be used to update / optimize the perceptron 412. The loss function for backpropagation can use mean squared error, calculated as follows: L = |Q π (S i A i ,θ)-y′| 2 (3) y′=\left\{\begin{aligned}&R i &i=N\\&R i +Q′ π (S i+1 A i , θ′), & others\end{aligned}\right (4), where Q′ π (S i A i θ′) is a target network, initially constructed by replicating the perceptron Q. π (S i A i Let y′ be the target value created by θ, which is the label value used in the algorithm to update the value network (such as the Q network). It is equivalent to an approximation of the "true value" and is used to calculate the loss between the predicted value and the target value. Let i represent the time step index. The interaction process of reinforcement learning unfolds step by step: the entity is in state S at step i. i Execute action A i Receive reward R i Then transition to state S in step i+1. i+1 N represents the termination time step, i.e., the last step of an episode. When i = N, there are no subsequent states or actions, so the objective value is determined solely by the reward of the current step, R. i Let Q' represent the immediate reward at step i, which is the feedback from the environment to the entity's action at step i. This is the core objective that the entity needs to maximize in reinforcement learning. π (S i A i θ′) represents the estimated value of the parameterized action value function, which is decomposed as follows: Q π(s, a): The standard action-value function, representing the expected cumulative discounted reward an entity can obtain after performing action a in state s under policy π. The superscript ' and parameter θ' indicate that this is the output of a target network. In reinforcement learning (such as DQN), to ensure training stability, a target network with the same structure as the main network but slower parameter updates is set up. θ' represents the parameters of the target network, distinct from the parameters θ of the main network. i+1 A represents the state at step i+1, which is the next state transitioned to after the action at step i is executed. i This represents the action performed at step i. Note that in regular Q-learning, the action of the Q function should be the action A at step i+1. i+1 .

[0084] As an example only, if the training data in replay memory 413 exceeds BS, then data of BS size are randomly sampled. Thus, Q is optimized through backpropagation according to formula (3). π (S i A i ,θ).

[0085] In some embodiments, after encoding and decoding all images of video 416, the performance corresponding to the current set of bitrate control parameters can be determined. In this case, if the performance corresponding to the current set of bitrate control parameters is better than the performance corresponding to the previous set of bitrate control parameters, the current set of bitrate control parameters is determined as the optimal set of bitrate control parameters. For example, the standard for recording a bitrate control scheme is that its performance is the best among all schemes during training, and the performance is calculated as follows: Among them, BR t BR represents the target video bitrate. e Q represents the encoded video bitrate. i This represents the quality of the encoded image, and N represents the number of images in the input and output videos.

[0086] In some embodiments, the above process is repeated until the defined training period is completed. As an example only, after encoding all images in the video, the performance of the current video bitrate control scheme is calculated according to formula (5). If the performance is better than previously recorded, the new SL is obtained by updating the bitrate control scheme using the currently recorded one. expl .

[0087] If the training period of entity 411 is equal to the update period of the target network, the updated perceptron network parameters can be copied to the target network. For example, the parameters can be copied from the perceptron Q after a predetermined number of training periods. π (S i A i, θ) to obtain its updated network parameters and copy them into the target network. The predetermined number of training cycles can be any suitable number, for example, it can be determined according to the actual situation. In some embodiments, when the training cycle is equal to E u is a multiple of, then copy the network parameters in Q π (S i , A i , θ) into the target network Q′ π (S i+1 , A i , θ′).

[0088] In certain embodiments, the exploration process 410 can be repeated. For example, it can be repeated for E n cycles.

[0089] The exploitation process 420 can output using the trained entity. During the exploitation process 420, for the video units of the video and during the bitstream conversion of the video, a first set of rate control parameters for the video unit can be determined based on the trained entity 422 in the entity 411. For example, according to the action-value function Q π (S i , A i , θ), output the rate control scheme SL explt , and the determination method is as follows, SL explt ={A i |i = 1,...N}(6) Where SL explt represents the first set of rate control parameters, A i represents the quantization parameter, N represents the number of pictures in the video, θ * represents the weights of the perceptron, π* represents the optimal policy, \mathop{\arg\max}\limits {A} represents the maximum parameter solving operator.

[0090] Determine the third set of rate control parameters based on the first set of rate control parameters and the second set of rate control parameters. The second set of rate control parameters is the optimal set of rate control parameters in the process of training the entity (i.e., the exploration process 410). For example, the third set of rate control parameters is:[[]] Where SL {ao} represents the third set of rate control parameters, which is the finally selected optimal sequence, SL {expl} represents the second set of rate control parameters and SL {explt}} represents the first set of rate control parameters, P(SL) represents the probability of the sequence SL, indicating the rationality, effectiveness or possibility of the sequence under the current task model. Only as an example, as shown in FIG. 4, it can be based on SLexplt SL expl Formula (8) establishes the final bitrate control solution.

[0091] The conversion is performed based on the third set of rate control parameters. In this way, the rate control scheme found has a rate error and encoding quality that are very close to the optimal scheme.

[0092] Figure 5 shows a flowchart of a method 500 for video processing according to an embodiment of the present disclosure. Method 500 is implemented during the conversion between video units of a video and a bitstream of a video.

[0093] At block 510, for the conversion of video units and video bitstreams, a first set of bitrate control parameters for the video units is determined based on the perceptron in the entity. In some embodiments, the conversion includes encoding the video units into a bitstream. In other embodiments, the conversion includes decoding the video units from the bitstream.

[0094] At box 520, a third set of bitrate control parameters is determined based on the first and second sets of bitrate control parameters. The second set of bitrate control parameters is the optimal set of bitrate control parameters during entity training.

[0095] At box 530, the conversion is performed based on the third set of bitrate control parameters.

[0096] In some embodiments, the inputs for training the entity include one or more of the following: video, target bitrate, total training cycle of the entity, playback memory capacity of the entity, amount of training data of the entity, or update cycle of the target network.

[0097] In some embodiments, the entity also includes playback memory.

[0098] In some embodiments, method 500 further includes: training an entity, wherein training the entity includes iteratively performing the following steps until a total training cycle of the entity is reached: determining state parameters based on the bit consumption of the video, the compression quality of the encoded image, and the number of images to be encoded; determining quantization parameters based on the state parameters and an initialized perceptron; encoding the next image to be encoded based on the quantization parameters; determining a reward for the quantization parameters and a next state parameter based on the encoding quality and bitrate error; storing a set of training data including the state parameters, quantization parameters, reward, and next state parameter into the entity's playback memory; and updating the initialized perceptron based on a portion of the training data in the playback memory in response to the training data in the playback memory exceeding a data quantity threshold.

[0099] In some embodiments, method 500 further includes: determining the performance corresponding to the current set of bitrate control parameters in response to encoding and decoding all images of the video; and determining the current set of bitrate control parameters as the second set of bitrate control parameters in response to the performance corresponding to the current set of bitrate control parameters being better than the performance corresponding to the previous set of bitrate control parameters.

[0100] In some embodiments, method 500 further includes: in response to the training period of the entity being equal to the update period of the target network, copying the updated network parameters of the perceptron to the target network.

[0101] In some embodiments, method 500 further includes: obtaining an initialized perceptron based on random weights; and obtaining a target network based on random weights.

[0102] In some embodiments, method 500 further includes: determining quantization parameters based on state parameters and an initialized perceptron, including: feeding the state parameters into the initialized perceptron to obtain an action value list, wherein the action value list includes one or more quantization parameters and a reward corresponding to each quantization parameter; and determining the quantization parameters based on the action value list and a search strategy.

[0103] In some embodiments, method 500 further includes: determining a quantization parameter based on the action value list and the search strategy, including: sampling probability values ​​from a uniform distribution; and determining the quantization parameter with the maximum reward in the action value list as the quantization parameter in response to the probability value being greater than a probability threshold; or randomly determining the quantization parameter from the action value list in response to the probability value being less than a probability threshold.

[0104] In some embodiments, method 500 further includes updating the initialized perceptron based on a portion of the training data in the training data, which includes updating the initialized perceptron using backpropagation.

[0105] In some embodiments, the state parameter is determined as: Where S i Represents the state parameter, BR t The target video bitrate is represented by FPS, the video frame rate is represented by FPS, N represents the number of images in the video, and Q represents the number of frames in the video. i B represents the quality of the encoded image. j This represents the number of bits consumed by the encoded j-th image, where i and j are integers.

[0106] In some embodiments, the first set of bitrate control parameters is: SL explt ={A i |i=1,...N}, where Among them SL expltThis represents the first set of bitrate control parameters, Ai represents the quantization parameter, N represents the number of images in the video, and θ... * The weights of the perception machine are represented.

[0107] In some embodiments, the third set of bitrate control parameters are: Among them SL {expl} This indicates the second set of rate control parameters and SL. {explt}} This represents the first set of bitrate control parameters.

[0108] According to another embodiment of this disclosure, a non-transitory computer-readable recording medium is provided for storing a bitstream of video generated by a method performed by an apparatus for video processing. The method includes: determining a first set of bitrate control parameters for video units of the video based on a perceptron in an entity; determining a third set of bitrate control parameters based on the first set of bitrate control parameters and a second set of bitrate control parameters, wherein the second set of bitrate control parameters is an optimal set of bitrate control parameters obtained during the training of the entity; and generating a bitstream of video units based on the third set of bitrate control parameters.

[0109] According to further embodiments of this disclosure, a method for storing a bitstream of video is provided. The method includes: determining a first set of bitrate control parameters for video units of the video based on a perceptron in an entity; determining a third set of bitrate control parameters based on the first set of bitrate control parameters and a second set of bitrate control parameters, wherein the second set of bitrate control parameters is the optimal set of bitrate control parameters obtained during the training of the entity; generating a bitstream of the video units based on the third set of bitrate control parameters; and storing the bitstream in a non-transitory computer-readable recording medium.

[0110] The embodiments of this disclosure can be described according to the following entries, and their features can be combined in any reasonable manner.

[0111] Item 1. A method for video processing, comprising: converting video units of a video and a bitstream of the video; determining a first set of bitrate control parameters for the video units based on a perceptron in an entity; determining a third set of bitrate control parameters based on the first set of bitrate control parameters and a second set of bitrate control parameters, wherein the second set of bitrate control parameters is an optimal set of bitrate control parameters obtained during training of the entity; and performing the conversion based on the third set of bitrate control parameters.

[0112] Item 2. The method according to Item 1, wherein the input for training the entity includes one or more of the following: the video, the target bitrate, the total training period of the entity, the playback memory capacity of the entity, the amount of training data of the entity, or the update period of the target network.

[0113] Item 3. The entity further includes replay memory in accordance with the method described in Item 1 or 2.

[0114] Item 4. The method according to any one of items 1-3, further comprising: training the entity, wherein training the entity includes iteratively performing the following steps until a total training period of the entity is reached: determining state parameters based on the bit consumption of the video, the compression quality of the encoded image, and the number of images to be encoded; determining quantization parameters based on the state parameters and an initialized perceptron; encoding the next image to be encoded based on the quantization parameters; determining a reward for the quantization parameters and a next state parameter based on the encoding quality and bitrate error; storing a set of training data including the state parameters, the quantization parameters, the reward, and the next state parameter into the playback memory of the entity; and updating the initialized perceptron based on a portion of the training data in the playback memory in response to the training data in the playback memory exceeding a data quantity threshold.

[0115] Item 5. The method according to Item 4 further includes: in response to encoding and decoding all images of the video, determining the performance corresponding to a current set of bitrate control parameters; and in response to the performance corresponding to the current set of bitrate control parameters being better than the performance corresponding to a previous set of bitrate control parameters, determining the current set of bitrate control parameters as a second set of bitrate control parameters.

[0116] Item 6. The method according to Item 4 further includes: in response to the training period of the entity being equal to the update period of the target network, copying the updated network parameters of the perceptron to the target network.

[0117] Item 7. The method according to Item 4 further includes: obtaining the initialized perceptron based on random weights; and obtaining the target network based on random weights.

[0118] Item 8. The method according to Item 4, wherein determining the quantization parameter based on the state parameter and the initialized perceptron comprises: feeding the state parameter into the initialized perceptron to obtain an action value list, wherein the action value list includes one or more quantization parameters and a reward corresponding to each quantization parameter; and determining the quantization parameter based on the action value list and a search strategy.

[0119] Item 9. The method according to Item 8, wherein determining the quantization parameter based on the action value list and the search strategy includes: sampling probability values ​​from a uniform distribution; and determining the quantization parameter with the maximum reward in the action value list as the quantization parameter in response to the probability value being greater than a probability threshold; or randomly determining the quantization parameter from the action value list in response to the probability value being less than a probability threshold.

[0120] Item 10. The method according to Item 4, wherein updating the initialized perceptron based on a portion of the training data comprises: updating the initialized perceptron using backpropagation.

[0121] Item 11. According to the method described in Item 4, wherein the state parameter is determined as: Where S i Represents the state parameter, BR t The target video bitrate is represented by FPS, the video frame rate is represented by FPS, N represents the number of images in the video, and Q represents the number of frames in the video. i B represents the quality of the encoded image. j This represents the number of bits consumed by the encoded j-th image, where i and j are integers.

[0122] Item 12. The method according to any one of items 1-11, wherein the first set of bit rate control parameters is: SL explt ={A i |i=1,...N}, where Among them SL explt The first set of bitrate control parameters is represented by Ai, the quantization parameter is represented by N, and the number of images in the video is represented by θ. * The weights of the perception machine are represented.

[0123] Item 13. The method according to any one of items 1-12, wherein the third set of bit rate control parameters is: Among them SL {expl} This indicates the second set of rate control parameters and SL. {explt}} This represents the first set of bitrate control parameters.

[0124] Item 14. The method according to any one of items 1-13, wherein the conversion includes encoding the video unit into the bitstream.

[0125] Item 15. The method according to any one of items 1-13, wherein the conversion includes decoding the video unit from the bitstream.

[0126] Item 16. An apparatus for video processing, comprising a processor and a non-transitory memory having instructions thereon, wherein the instructions, when executed by the processor, cause the processor to perform a method according to any one of Items 1-14.

[0127] Item 17. A non-transitory computer-readable storage medium storing instructions that cause a processor to execute the method according to any one of Items 1-14.

[0128] Item 18. A non-transitory computer-readable recording medium storing a bitstream of video generated by a method performed by a video processing apparatus, wherein the method includes: determining a first set of bitrate control parameters for video units of the video based on a perceptron in an entity; determining a third set of bitrate control parameters based on the first set of bitrate control parameters and a second set of bitrate control parameters, wherein the second set of bitrate control parameters is an optimal set of bitrate control parameters during the training of the entity; and generating the bitstream of the video units based on the third set of bitrate control parameters.

[0129] Item 19. A method for storing a bitstream of video, comprising: determining a first set of bitrate control parameters for video units of the video based on a perceptron in an entity; determining a third set of bitrate control parameters based on the first set of bitrate control parameters and a second set of bitrate control parameters, wherein the second set of bitrate control parameters is an optimal set of bitrate control parameters obtained during training of the entity; generating the bitstream of the video units based on the third set of bitrate control parameters; and storing the bitstream in a non-transitory computer-readable recording medium. Example Device

[0130] Figure 6 shows a block diagram of a computing device 600 in which various embodiments of the present disclosure may be implemented. The computing device 600 may be implemented as a source device 110 (or video encoder 114 or 200) or a destination device 120 (or video decoder 124 or 300), or may be included in a source device 110 (or video encoder 114 or 200) or a destination device 120 (or video decoder 124 or 300).

[0131] It should be understood that the computing device 600 shown in FIG6 is for illustrative purposes only and is not intended to imply any limitation on the functionality and scope of the embodiments of this disclosure.

[0132] As shown in Figure 6, the computing device 600 includes a general-purpose computing device 600. The computing device 600 may include at least one or more processors or processing units 610, memory 620, storage units 630, one or more communication units 640, one or more input devices 650, and one or more output devices 660.

[0133] In some embodiments, computing device 600 can be implemented as any user terminal or server terminal with computing capabilities. The server terminal can be a server provided by a service provider, a large computing device, etc. The user terminal can be, for example, any type of mobile terminal, fixed terminal, or portable terminal, including mobile phones, stations, units, devices, multimedia computers, multimedia tablet computers, internet nodes, communicators, desktop computers, laptop computers, notebook computers, netbook computers, tablet computers, personal communication system (PCS) devices, personal navigation devices, personal digital assistants (PDAs), audio / video players, digital cameras / camcorders, positioning devices, television receivers, radio receivers, e-book devices, gaming devices, or any combination thereof, and includes accessories and peripherals of these devices, or any combination thereof. It is conceivable that computing device 600 can support any type of interface to the user (such as "wearable" circuitry devices, etc.).

[0134] Processing unit 610 can be a physical processor or a virtual processor, and can perform various processes based on programs stored in memory 620. In a multiprocessor system, multiple processing units execute computer-executable instructions in parallel to improve the parallel processing capabilities of computing device 600. Processing unit 610 may also be referred to as a central processing unit (CPU), microprocessor, controller, or microcontroller.

[0135] Computing device 600 typically includes various computer storage media. Such media can be any media accessible by computing device 600, including but not limited to volatile and non-volatile media, or removable and non-removable media. Memory 620 can be volatile memory (e.g., registers, cache, random access memory (RAM)), non-volatile memory (such as read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), or flash memory) or any combination thereof. Storage cell 630 can be any removable or non-removable media and may include machine-readable media, such as memory, flash drives, disks, or other media that can be used to store information and / or data and can be accessed within computing device 600.

[0136] The computing device 600 may also include additional removable / non-removable storage media, and volatile / non-volatile storage media. Although not shown in Figure 6, a disk drive for reading from and / or writing to a removable non-volatile disk, and an optical disk drive for reading from and / or writing to a removable non-volatile optical disk may be provided. In this case, each drive may be connected to a bus (not shown) via one or more data media interfaces.

[0137] Communication unit 640 communicates with another computing device via a communication medium. Furthermore, the functionality of the components in computing device 600 can be implemented by a single computing cluster or by multiple computing machines communicating via communication connections. Therefore, computing device 600 can operate in a networked environment using logical connections to one or more other servers, networked personal computers (PCs), or other general-purpose network nodes.

[0138] Input device 650 can be one or more of various input devices, such as a mouse, keyboard, trackball, voice input device, etc. Output device 660 can be one or more of various output devices, such as a monitor, speaker, printer, etc. With the aid of communication unit 640, computing device 600 can also communicate with one or more external devices (not shown), such as storage devices and display devices. Computing device 600 can also communicate with one or more devices that enable a user to interact with computing device 600, or any device that enables computing device 600 to communicate with one or more other computing devices (e.g., network card, modem, etc.), if needed. Such communication can be performed via an input / output (I / O) interface (not shown).

[0139] In some embodiments, some or all components of computing device 600 may not be integrated into a single device, but may be deployed within a cloud computing architecture. In a cloud computing architecture, components may be provided remotely and work together to achieve the functionality described herein. In some embodiments, cloud computing provides computing, software, data access, and storage services without requiring end users to know the physical location or configuration of the systems or hardware providing these services. In various embodiments, cloud computing provides services via a wide area network (WAN), such as the Internet, using suitable protocols. For example, a cloud computing provider offers applications via a WAN that can be accessed through a web browser or any other computing component. The software or components of the cloud computing architecture, along with the corresponding data, may be stored on servers at remote locations. Computing resources in a cloud computing environment may be consolidated or distributed across locations in remote data centers. Cloud computing infrastructure may provide services through shared data centers, although they may appear as a single access point for users. Therefore, cloud computing architectures can be used to provide the components and functionality described herein from service providers at remote locations. Alternatively, they may be provided from traditional servers or installed directly or otherwise on client devices.

[0140] In embodiments of this disclosure, computing device 600 can be used to implement video encoding / decoding. Memory 620 may include one or more video codec modules 625 having one or more program instructions. These modules can be accessed and executed by processing unit 610 to perform the functions of the various embodiments described herein.

[0141] In an example embodiment of performing video encoding, input device 650 may receive video data as input 670 to be encoded. The video data may be processed, for example, by video codec module 625 to generate an encoded bitstream. The encoded bitstream may be provided as output 680 via output device 660.

[0142] In an example embodiment of performing video decoding, input device 650 may receive an encoded bitstream as input 670. The encoded bitstream may be processed, for example, by video codec module 625 to generate decoded video data. The decoded video data may be provided as output 680 via output device 660.

[0143] While this disclosure has been specifically shown and described with reference to preferred embodiments, those skilled in the art will understand that various changes in form and detail may be made without departing from the spirit and scope of this application as defined by the appended claims. These changes are intended to be covered by the scope of this application. Therefore, the foregoing description of embodiments of this application is not intended to be limiting.

Claims

1. A method for video processing, comprising: For the conversion between video units and the bitstream of the video, a first set of bitrate control parameters for the video unit is determined based on the perceptron in the entity; Based on the first set of bitrate control parameters and the second set of bitrate control parameters, a third set of bitrate control parameters is determined, wherein the second set of bitrate control parameters is the optimal set of bitrate control parameters during the training of the entity; and The conversion is performed based on the third set of bitrate control parameters.

2. The method according to claim 1, wherein the input for training the entity includes one or more of the following: the video, the target bitrate, the total training period of the entity, the playback memory capacity of the entity, the amount of training data of the entity, or the update period of the target network.

3. The method according to claim 1 or 2, wherein the entity further includes playback memory.

4. The method according to any one of claims 1-3, further comprising: Training the entity, wherein training the entity includes iteratively performing the following steps until the total training period of the entity is reached: Based on the bit consumption of the video, the compression quality of the encoded image, and the number of images to be encoded, the state parameters are determined. Based on the state parameters and the initialized perceptron, the quantization parameters are determined; Based on the quantization parameters, the next image to be encoded is encoded; Based on the encoding quality and bitrate error, the reward for the quantization parameter and the next state parameter are determined; A set of training data, including the state parameters, the quantization parameters, the reward, and the next state parameters, is stored in the entity's playback memory. as well as In response to the training data in the playback memory exceeding a data quantity threshold, the initialized perceptron is updated based on a portion of the training data.

5. The method according to claim 4, further comprising: In response to encoding and decoding all images of the video, determine the performance corresponding to the current set of bitrate control parameters; as well as In response to the fact that the performance corresponding to the current set of bitrate control parameters is better than the performance corresponding to the previous set of bitrate control parameters, the current set of bitrate control parameters is determined as the second set of bitrate control parameters.

6. The method according to claim 4, further comprising: In response to the fact that the training period of the entity is equal to the update period of the target network, the updated network parameters of the perceptron are copied to the target network.

7. The method according to claim 4, further comprising: The initialized perceptron is obtained based on random weights; as well as The target network is obtained based on random weights.

8. The method of claim 4, wherein determining the quantization parameters based on the state parameters and the initialized perceptron comprises: The state parameters are fed into the initialized perceptron to obtain an action value list, wherein the action value list includes one or more quantization parameters and a reward corresponding to each quantization parameter; as well as The quantification parameters are determined based on the action value list and search strategy.

9. The method of claim 8, wherein determining the quantification parameter based on the action value list and the search strategy includes: Sampling probability values ​​from a uniform distribution: and In response to the probability value being greater than a probability threshold, the quantification parameter with the highest reward in the action value list is determined as the quantification parameter; or In response to the probability value being less than a probability threshold, the quantization parameter is randomly determined from the action value list.

10. The method of claim 4, wherein updating the initialized perceptron based on a portion of the training data comprises: The initial perceptron is updated using backpropagation.

11. The method of claim 4, wherein the state parameter is determined as: where S i denotes the state parameter, BR t denotes the target video bitrate, FPS denotes the frame rate of the video, N denotes the number of pictures of the video, Q i denotes the quality of the encoded picture, B j denotes the number of bits consumed by the j-th encoded picture, and i and j are integers.

12. The method of any one of claims 1-11, wherein the first set of rate control parameters is: SL explt = {A i | i = 1,... N}, wherein wherein SL explt represents the first set of rate control parameters, Ai represents a quantization parameter, N represents a number of pictures of the video, θ * represents weights of the perceptron.

13. The method according to any one of claims 1-12, wherein the third set of bit rate control parameters is: wherein SL {expl} represents the second set of rate control parameters and SL {explt}} represents the first set of rate control parameters.

14. The method according to any one of claims 1-13, wherein the conversion comprises encoding the video unit into the bitstream.

15. The method according to any one of claims 1-13, wherein the conversion comprises decoding the video unit from the bitstream.

16. An apparatus for video processing, comprising a processor and a non-transitory memory having instructions thereon, wherein the instructions, when executed by the processor, cause the processor to perform the method according to any one of claims 1-14.

17. A non-transitory computer-readable storage medium storing instructions that cause a processor to perform the method according to any one of claims 1-14.

18. A non-transitory computer-readable recording medium storing a bitstream of video generated by a method performed by means of a video processing apparatus, wherein the method comprises: The first set of bitrate control parameters for the video unit of the video is determined based on the perceptron in the entity. Based on the first set of bitrate control parameters and the second set of bitrate control parameters, a third set of bitrate control parameters is determined, wherein the second set of bitrate control parameters is the optimal set of bitrate control parameters during the training of the entity; and The bitstream of the video unit is generated based on the third set of bitrate control parameters.

19. A method for storing a bitstream of video, comprising: The first set of bitrate control parameters for the video unit of the video is determined based on the perceptron in the entity. Based on the first set of bitrate control parameters and the second set of bitrate control parameters, a third set of bitrate control parameters is determined, wherein the second set of bitrate control parameters is the optimal set of bitrate control parameters during the training of the entity. The bitstream of the video unit is generated based on the third set of bitrate control parameters; as well as The bit stream is stored in a non-transitory computer-readable recording medium.