Video encoding method and apparatus

By segmenting the video into segments and adjusting the encoding parameters, the problem of image quality degradation in dynamic regions during video encoding is solved, achieving stable video quality and efficient encoding.

WO2026138025A1PCT designated stage Publication Date: 2026-07-02SHANGHAI HODE INFORMATION TECH CO LTD

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
SHANGHAI HODE INFORMATION TECH CO LTD
Filing Date
2025-09-23
Publication Date
2026-07-02

AI Technical Summary

Technical Problem

When encoding videos with a slightly moving or static background and a rapidly moving foreground, existing technologies have a relatively small impact on the overall image due to distortion in the dynamic areas. This results in image quality degradation in the dynamic areas that the human eye focuses on, such as blurring and blockiness.

Method used

The video is divided into multiple video segments. By calculating the proportion of dynamic regions, the encoding parameters are adjusted to improve the encoding quality of dynamic regions and ensure the overall image quality stability.

Benefits of technology

It improves the encoding quality of dynamic regions, avoids image quality degradation in areas of human visual focus, and maintains the stability of the actual image quality of the video.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN2025123296_02072026_PF_FP_ABST
    Figure CN2025123296_02072026_PF_FP_ABST
Patent Text Reader

Abstract

Embodiments of the present application relate to the technical field of video encoding and decoding, and provide a video encoding method. The video encoding method comprises: segmenting a video into a plurality of original video clips, the plurality of original video clips including a target video clip; determining a dynamic area proportion of the target video clip, the dynamic area proportion being the proportion of a dynamic area in a video picture; when the dynamic area proportion is less than a preset proportion threshold, determining a first encoding parameter of the target video clip; and encoding the target video clip on the basis of the first encoding parameter. The technical solution of the embodiments of the present application can improve the encoding quality of the dynamic area and maintain the stability of the actual image quality.
Need to check novelty before this filing date? Find Prior Art

Description

Video encoding method and apparatus

[0001] This application claims priority to Chinese patent application No. 202411929998.3, filed on December 25, 2024, entitled "Video Coding Method and Apparatus", the entire contents of which are incorporated herein by reference.

[0002] Technical Field

[0003] This application relates to the field of multimedia technology, and in particular to a video encoding method, apparatus, computer equipment, computer-readable storage medium, and computer program product. Background Technology

[0004] During video production, to provide users with a stable viewing experience, the video quality after encoding is controlled to meet a preset target during video transcoding. This includes controlling the VMAF (Video Multimethod Assessment Fusion) metric after video encoding. When evaluating the quality of encoded video, VMAF first calculates the local distortion in the image, then averages the local distortions to obtain the overall image distortion, and uses this average to evaluate the video quality.

[0005] The inventors have discovered that, in practical use, for videos with a slightly moving or static background but a rapidly moving foreground, the larger proportion of static or slowly moving frames is relatively easy to encode, resulting in minimal distortion. Meanwhile, the dynamic area occupies a smaller proportion, and distortion in the dynamic area has a relatively small impact on the overall image's VMAF (Virtual Object AF). In this case, the video quality meets requirements, but the dynamic areas that the human eye focuses on exhibit image quality degradation phenomena such as blurring and blockiness.

[0006] It should be noted that the above content is not necessarily prior art, nor is it intended to limit the scope of patent protection of this application. Summary of the Invention

[0007] This application provides a video encoding method, apparatus, computer device, computer-readable storage medium, and computer program product to solve or alleviate one or more of the technical problems mentioned above.

[0008] One aspect of this application provides a video encoding method, the method comprising:

[0009] The video is divided into multiple original video segments, and the multiple original video segments include the target video segment;

[0010] Determine the dynamic region proportion of the target video segment, whereby the dynamic region proportion is the percentage of the dynamic region in the video frame;

[0011] When the proportion of the dynamic region is lower than a preset proportion threshold, the first encoding parameters of the target video segment are determined; and

[0012] The target video segment is encoded according to the first encoding parameters.

[0013] Optionally, the video can be divided into multiple original video segments, including:

[0014] Identify multiple scenes in the video;

[0015] Based on the multiple scenarios, the video is divided into multiple original video segments; wherein, one original video segment corresponds to one scenario.

[0016] Optionally, determining the dynamic region proportion of the target video segment includes:

[0017] Multiple video frames in the target video segment are determined according to a preset interval;

[0018] Determine the optical flow value of each pixel in each of the video frames;

[0019] The dynamic region of the target video segment is determined based on the optical flow value of each pixel.

[0020] The proportion of the dynamic region of the target video segment is determined based on the dynamic region and the video frame region of the target video segment.

[0021] Optionally, determining the dynamic region of the target video segment based on the optical flow value of each pixel includes:

[0022] If the optical flow value of a pixel is greater than the preset optical flow threshold, the location of the pixel is determined to be a dynamic region.

[0023] Optionally, the method includes:

[0024] The preset optical flow threshold is adjusted based on the optical flow value of each pixel in each video frame.

[0025] Optionally, the method further includes:

[0026] Each of the original video segments is encoded according to preset encoding parameters to obtain multiple encoded video segments;

[0027] If the image quality of the target encoded video segment does not meet the preset conditions, the preset encoding parameters are adjusted to obtain the second encoding parameters; the target encoded video segment is any one of the plurality of encoded video segments;

[0028] The first original video segment corresponding to the target encoded video segment is encoded according to the second encoding parameters.

[0029] Optionally, determining the first encoding parameters of the target video segment includes:

[0030] If the target video segment is not the first original video segment, the first encoding parameters are obtained by adjusting the second encoding parameters.

[0031] When the target video segment is the first original video segment, the first encoding parameters are obtained by adjusting according to the preset encoding parameters.

[0032] Optionally, encoding the target video segment according to the first encoding parameters includes:

[0033] If the first encoding parameter is not less than the preset minimum encoding parameter, the target video segment is encoded according to the first encoding parameter;

[0034] If the first encoding parameter is less than the preset minimum encoding parameter, the target video segment is encoded according to the preset minimum encoding parameter.

[0035] Another aspect of this application provides a video encoding apparatus, the apparatus comprising:

[0036] A segmentation module is used to segment a video into multiple original video segments, wherein the multiple original video segments include a target video segment;

[0037] The first determining module is used to determine the dynamic region proportion of the target video segment, wherein the dynamic region proportion is the proportion of the dynamic region in the video frame;

[0038] The second determining module is used to determine the first encoding parameters of the target video segment when the proportion of the dynamic region is lower than a preset proportion threshold; and

[0039] The encoding module is used to encode the target video segment according to the first encoding parameters.

[0040] Another aspect of this application provides a computer device, including:

[0041] At least one processor; and

[0042] A memory that is communicatively connected to the at least one processor;

[0043] Wherein: the memory stores instructions that can be executed by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the method as described above.

[0044] Another aspect of this application provides a computer-readable storage medium storing computer instructions that, when executed by a processor, implement the method described above.

[0045] Another aspect of this application provides a computer program product including computer-readable instructions that, when executed by a processor, implement the method described above.

[0046] The embodiments of this application employing the above technical solution may include the following advantages: by first dividing the video into video segments, and then adjusting the encoding parameters of the video segments containing a small proportion of dynamic areas in the picture, and using the adjusted encoding parameters for encoding, the encoding quality of the dynamic areas can be improved, and the stability of the actual image quality can be maintained. Attached Figure Description

[0047] The accompanying drawings exemplify embodiments and form part of the specification, serving together with the textual description to explain exemplary implementations of the embodiments. The illustrated embodiments are for illustrative purposes only and do not limit the scope of the claims. Throughout the drawings, the same reference numerals refer to similar but not necessarily identical elements.

[0048] Figure 1 schematically illustrates the operating environment of the video encoding method according to Embodiment 1 of this application;

[0049] Figure 2 schematically illustrates a flowchart of a video encoding method according to Embodiment 1 of this application;

[0050] Figure 3 schematically shows the flowchart of the sub-steps of step S200 in Figure 2;

[0051] Figure 4 schematically shows the flowchart of the sub-steps of step S202 in Figure 2;

[0052] Figure 5 schematically illustrates the newly added flowchart of the video encoding method according to Embodiment 1 of this application;

[0053] Figure 6 schematically shows the flowchart of the sub-steps of step S204 in Figure 2;

[0054] Figure 7 schematically shows the flowchart of the sub-steps of step S206 in Figure 2;

[0055] Figure 8 is an application example diagram of the video encoding method according to an embodiment of this application;

[0056] Figure 9 schematically shows a block diagram of a video encoding apparatus according to Embodiment 2 of this application; and

[0057] Figure 10 schematically illustrates a hardware architecture diagram of a computer device according to Embodiment 3 of this application. Embodiments of the present invention

[0058] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application. All other embodiments obtained by those skilled in the art based on the embodiments in this application without inventive effort are within the scope of protection of this application.

[0059] It should be noted that the descriptions involving "first," "second," etc., in the embodiments of this application are for descriptive purposes only and should not be construed as indicating or implying their relative importance or implicitly specifying the number of technical features indicated. Therefore, a feature defined with "first" or "second" may explicitly or implicitly include at least one of that feature. Furthermore, the technical solutions of the various embodiments can be combined with each other, but this must be based on the ability of those skilled in the art to implement them. If the combination of technical solutions is contradictory or impossible to implement, it should be considered that such a combination of technical solutions does not exist and is not within the scope of protection claimed in this application.

[0060] In the description of this application, it should be understood that the numerical labels before the steps do not indicate the order of the steps, but are only used to facilitate the description of this application and to distinguish each step, and therefore should not be construed as a limitation of this application.

[0061] First, a definition of the terminology used in this application is provided:

[0062] Video encoding is the process of converting raw video data into a compressed format for easier storage and transmission.

[0063] RF (Rate Factor) is an encoding parameter in CRF encoding mode that controls the bitrate and quality of the encoded file. For the same video, a higher RF results in a lower bitrate and lower image quality.

[0064] VMAF (Video Multimethod Assessment Fusion) is a full-reference assessment method for measuring the quality of encoded video.

[0065] Optical flow is a technique in computer vision used to represent the speed and direction of pixel movement between adjacent frames in an image.

[0066] Distortion refers to the deformation that occurs in an image during compression, transmission, or display, such as pixelation, blurring, or color distortion.

[0067] Command-line tools are tools that allow you to execute computer-readable instructions through a text-based command-line interface (CLI).

[0068] An API (Application Programming Interface) is a set of predefined functions, protocols, and tools used to build software and applications.

[0069] PSNR (Peak Signal-to-Noise Ratio) is a metric for evaluating image quality, used in image compression and various image processing applications.

[0070] SSIM (Structural Similarity Index) is a metric that measures the similarity between two images. It assesses image quality by comparing the brightness, contrast, and structural information of the images.

[0071] QP (Quantizer Parameter) is a parameter in video coding that determines the degree of image compression. A smaller QP value indicates a smaller quantization step size, resulting in higher image quality, but also an increased bitrate. Conversely, a larger QP value indicates a larger quantization step size, resulting in lower image quality and a lower bitrate.

[0072] Secondly, to facilitate understanding of the technical solutions provided in the embodiments of this application by those skilled in the art, the relevant technologies are described below:

[0073] During video production, to provide users with a stable viewing experience, the video quality after encoding is controlled to meet a preset target during video transcoding. For example, the VMAF (Virtual Machine Image Quality) metric after video encoding is controlled. When evaluating the quality of encoded video, VMAF first calculates the local distortion in the image, then averages the local distortion to obtain the overall image distortion, and uses this average to evaluate the video quality.

[0074] However, the inventors discovered that for videos with a slightly moving or static background but a rapidly moving foreground, the larger proportion of static or slowly moving scenes is relatively easy to encode, resulting in minimal distortion. The dynamic areas, which are more difficult to encode, constitute a smaller proportion, and distortion in these areas has a relatively small impact on the overall VMAF (Virtual Dynamic Aspect Rating). In such cases, although the overall image quality of the encoded video meets VMAF requirements, the dynamic areas that the human eye focuses on may exhibit image quality degradation phenomena such as blurring and blockiness.

[0075] Therefore, this application provides a video encoding technology solution. In this solution, the video is first segmented into video segments, and then the encoding parameters of the video segments containing a small proportion of dynamic areas are adjusted. Encoding is then performed using the adjusted encoding parameters, which improves the encoding quality of the dynamic areas and maintains the stability of the actual image quality. Details are provided below.

[0076] Finally, for ease of understanding, an exemplary operating environment is provided below.

[0077] For example, it can be used in computer devices 2 and 6 as shown in Figure 1. Computer device 6 can be configured to access server content (e.g., video) and services. Computer device 6 may include electronic devices with built-in or external display panels, such as mobile devices, tablets, laptops, workstations, virtual reality devices, gaming devices, digital streaming media devices, vehicle user terminals, smart TVs, set-top boxes, etc., and may also include virtualized computing instances. Virtualized computing instances may include virtual machines, such as simulations of computer systems, operating systems, servers, etc.

[0078] Computer device 6 can be associated with one or more users. A single user can also use one or more of computer devices 6 to access the server. Computer device 6 can travel to various locations and use different networks to access the server. Computer device 6 can include multiple client programs, such as video codecs, for providing encoding and decoding services. These video codecs can encode and compress video or images to facilitate their transmission or storage.

[0079] The following will provide several embodiments in the above exemplary application environment to illustrate the video encoding scheme. It should be understood that these embodiments can be implemented in many different forms and should not be construed as being limited to the embodiments set forth herein.

[0080] Example 1

[0081] Figure 2 schematically illustrates a flowchart of a video encoding method according to Embodiment 1 of this application.

[0082] As shown in Figure 2, the video encoding method may include steps S200~S206, wherein:

[0083] Step S200: Divide the video into multiple original video segments, wherein the multiple original video segments include the target video segment.

[0084] Step S202: Determine the dynamic region proportion of the target video segment, whereby the dynamic region proportion is the percentage of the dynamic region in the video frame.

[0085] Step S204: If the proportion of the dynamic region is lower than a preset proportion threshold, determine the first encoding parameters of the target video segment.

[0086] Step S206: Encode the target video segment according to the first encoding parameters.

[0087] The video encoding method provided in this embodiment adjusts the encoding parameters of dynamic video segments before encoding. Specifically, in this embodiment, the video is first segmented into video segments, and then the encoding parameters of the video segments containing a small proportion of dynamic areas are adjusted. Encoding with the adjusted parameters improves the encoding quality of the dynamic areas and maintains the stability of the actual image quality. Thus, while maintaining high video quality, the image quality of the dynamic areas that the human eye focuses on is also good, preventing image quality degradation phenomena such as blurring and blockiness.

[0088] The following, with reference to Figure 2, elaborates on each step in steps S200 to S206, as well as other optional steps.

[0089] Step S200: Divide the video into multiple original video segments, wherein the multiple original video segments include the target video segment.

[0090] In practice, video segmentation can be performed using editing tools, command-line tools, video processing APIs, etc. Depending on the specific needs, video segmentation can be based on video scenes, motion changes, audio features, face or object recognition, etc.

[0091] In this embodiment, the video is first segmented, which allows the encoder to focus on specific parts of the video (such as target video segments with a small proportion of dynamic areas) during the subsequent encoding process, reducing the waste of computing resources and improving encoding efficiency.

[0092] In an optional embodiment, as shown in FIG3, step S200 may include:

[0093] S300, identify multiple scenes in the video.

[0094] S302, based on the multiple scenes, the video is divided into multiple original video segments; wherein, one original video segment corresponds to one scene.

[0095] Videos are typically composed of spliced ​​segments of different scenes, each with distinct characteristics. Within the same scene, the actions of people or objects in different frames tend to be similar, and the dynamic area of ​​the frame changes relatively little. Therefore, a method can be used to first detect scene transitions in the complete video to pinpoint their locations, and then segment the video based on these transitions. In practice, scene transition algorithms, video summarization techniques, user-defined scene change points, and video frame clustering analysis can be employed for scene recognition and video segmentation.

[0096] In this embodiment, video segmentation based on the scene ensures that most frames within the same segment have similar dynamic range proportions. This improves the applicability of the adjusted encoding parameters to different frames within the same segment, thereby enhancing the quality and efficiency of video encoding.

[0097] Step S202: Determine the dynamic region proportion of the target video segment, whereby the dynamic region proportion is the percentage of the dynamic region in the video frame.

[0098] In practical video transcoding, metrics such as VMAF (Virtual Machine Image Quality) can be used to measure the image quality of the encoded video. When using VMAF, the distortion in local areas of the image is first calculated, then the average of these local distortions is used to obtain the overall image distortion, which is then used to evaluate the video quality. During quality evaluation, the video frame can be divided into dynamic areas (where people or objects are moving rapidly) and static areas (where people or objects are stationary or moving slowly). The proportion of the dynamic area in the video frame determines its influence on the VMAF metric; the higher the proportion, the greater the influence.

[0099] In this embodiment, by calculating the proportion of the dynamic region in the video frame, the influence of the dynamic region's image quality on the video quality evaluation index after encoding can be determined. Therefore, based on the proportion of the dynamic region, subsequent encoding strategies can be adjusted to improve the quality of video encoding and reduce image quality degradation in the dynamic region.

[0100] In an optional embodiment, as shown in FIG4, step S202 may include:

[0101] S400, determine multiple video frames in the target video segment according to a preset interval.

[0102] S402, determine the optical flow value of each pixel in each of the video frames.

[0103] S404, determine the dynamic region of the target video segment based on the optical flow value of each pixel.

[0104] S406, determine the proportion of the dynamic region of the target video segment based on the dynamic region and the video frame region of the target video segment.

[0105] In practice, when the video frame rate is 30fps, a preset interval of 2 video frames per second can be set. Other intervals can be used for video frame extraction as needed or at other frame rates.

[0106] After video frame extraction is completed, the dense optical flow of all extracted video frames is calculated first, and then the optical flow between any two adjacent video frames is calculated to obtain the total optical flow results for the video segment. Based on the obtained optical flow results, the dynamic regions in the video frame can be determined, and thus the proportion of dynamic regions in the video segment can be obtained.

[0107] In this embodiment, by quantitatively analyzing the optical flow values ​​of pixels in a video frame, dynamic regions in the video can be identified more accurately. This helps determine whether further parameter adjustments to the target video segment are needed in subsequent encoding work, which not only improves video quality but also enables more efficient and intelligent video processing.

[0108] In an optional embodiment, step S404 may include:

[0109] If the optical flow value of a pixel is greater than the preset optical flow threshold, the location of the pixel is determined to be a dynamic region.

[0110] In this embodiment, by determining the dynamic region in the image based on the optical flow value of the pixels in the video, the proportion of the dynamic region in the video image can be judged more accurately, which improves the targeting and efficiency of video encoding and helps to improve the quality of the video.

[0111] In an optional embodiment, the method may include:

[0112] The preset optical flow threshold is adjusted based on the optical flow value of each pixel in each video frame.

[0113] In practice, the preset optical flow threshold can be adjusted to the average value of the optical flow graph. Depending on actual needs, other methods (such as the median or weighted average of the optical flow graph) can also be used to adjust the preset optical flow threshold.

[0114] In practice, the optical flow value of a pixel can be affected by factors such as camera movement and changes in lighting, and the preset optical flow threshold may not be applicable to all video frames.

[0115] In this embodiment, the preset optical flow threshold is adjusted based on the actual optical flow of pixels in each video frame. In other words, the preset optical flow threshold can be optimized according to the characteristics of the video content. The optimized preset optical flow threshold more accurately identifies dynamic regions in the video, improving encoding flexibility and the quality of the encoded video.

[0116] In an optional embodiment, as shown in FIG5, the method may further include:

[0117] S500, each of the original video segments is encoded according to preset encoding parameters to obtain multiple encoded video segments.

[0118] S502, if the image quality of the target encoded video segment does not meet the preset conditions, adjust the preset encoding parameters to obtain the second encoding parameters; the target encoded video segment is any one of the plurality of encoded video segments.

[0119] S504, the first original video segment corresponding to the target encoded video segment is encoded according to the second encoding parameters.

[0120] In practice, the VMAF metric can be used to measure the image quality of the encoded video. Depending on the specific needs, PSNR, SSIM, etc., can also be used to evaluate video quality. In this embodiment, the RF parameter is used as the encoding parameter. In other embodiments, other parameters such as the QP value can also be used as encoding parameters.

[0121] It should be noted that step S502 can be executed repeatedly until the image quality of the final target encoded video segment meets the preset conditions. During the loop, the second encoding parameters used in each iteration are different.

[0122] In this embodiment, the encoded video can meet the expected image quality requirements by continuously adjusting the encoding parameters (such as increasing or decreasing the parameter values), thereby optimizing storage space and bandwidth usage while ensuring video quality, and achieving efficient video transmission and distribution.

[0123] Step S204: If the proportion of the dynamic region is lower than a preset proportion threshold, determine the first encoding parameters of the target video segment.

[0124] When encoding videos where static or slow-moving scenes constitute a large portion of the frame, while dynamic areas with rapid motion occupy a smaller portion, static areas are easier to encode with minimal distortion, while dynamic areas, which are more difficult to encode, are more prone to distortion. However, because dynamic areas occupy a smaller portion of the frame, their impact on the VAMF (Visual Ability Factor) metric is also smaller. Therefore, even if the VAMF metric is within acceptable limits, the dynamic areas that the human eye focuses on may exhibit damage, such as blurring or blockiness.

[0125] In practice, a preset percentage threshold of 30% can be set. Alternatively, other values ​​can be set depending on actual needs.

[0126] In this embodiment, by adjusting the encoding parameters of video segments with a dynamic region ratio lower than a preset ratio threshold, the overall image quality of the encoded video can meet the expected image quality requirements, while also ensuring the encoding effect of the dynamic region in the video, thereby improving the quality and efficiency of video encoding, and preventing damage to the dynamic region, such as blurring or blockiness.

[0127] In an optional embodiment, as shown in FIG6, step S204 may include:

[0128] S600, if the target video segment is not the first original video segment, the first encoding parameters are obtained by adjusting the second encoding parameters.

[0129] S602, if the target video segment is the first original video segment, the first encoding parameters are obtained by adjusting according to the preset encoding parameters.

[0130] In some embodiments, each video segment may undergo a preliminary encoding operation before determining whether it contains a small dynamic region, so that the overall video quality meets the expected quality requirements. In other embodiments, depending on actual needs, the preliminary encoding step may be performed before or simultaneously with the determination step.

[0131] In this embodiment, for video segments that have not undergone preliminary encoding, or that have undergone preliminary encoding but already meet the expected image quality requirements without adjusting the preset encoding parameters, the first encoding parameters can be obtained by adjusting the preset encoding parameters (increasing or decreasing the parameter value based on the preset encoding parameters). For video segments that have undergone preliminary encoding and adjusted the preset encoding parameters before determining the dynamic region ratio, the second encoding parameters obtained after adjustment can be corrected (increasing or decreasing the parameter value) to determine the first encoding parameters that meet the requirements, thereby improving the efficiency of video encoding and achieving a more efficient video processing workflow.

[0132] Step S206: Encode the target video segment according to the first encoding parameters.

[0133] In this embodiment, a targeted first encoding parameter is used to encode the target video segment containing a small proportion of dynamic regions. This can specifically improve the image quality of dynamic regions, thereby ensuring the encoding effect of the target video segment, reducing image quality instability, and providing some anti-shake and anti-shifting effects.

[0134] In an optional embodiment, as shown in FIG7, step S206 may include:

[0135] S700, if the first encoding parameter is not less than the preset minimum encoding parameter, the target video segment is encoded according to the first encoding parameter.

[0136] S702, if the first encoding parameter is less than the preset minimum encoding parameter, the target video segment is encoded according to the preset minimum encoding parameter.

[0137] In actual video encoding, if the encoding parameters (such as RF parameters) are too low, it may lead to over-compression of the video, or even loss of detail and image damage, affecting video quality.

[0138] In this embodiment, the parameters ultimately used for encoding are controlled to be no less than a preset minimum encoding parameter. This ensures that the output quality during video encoding meets certain standards and avoids affecting video quality due to excessively low parameter settings.

[0139] To make this application easier to understand, an exemplary application is provided below with reference to Figure 8. Wherein:

[0140] S11, input a video with a frame rate of 30fps, a preset percentage threshold of S=0.3, a preset encoding parameter of RF=23, a preset image quality condition of VMAF=95, and a preset minimum encoding parameter of RFmin=18.

[0141] S12, perform scene switching detection on the video to obtain three segments A, B, and C.

[0142] S13, use preset encoding parameters to encode segment A to obtain segment A1, and the VMAF of segment A1 is 94.

[0143] S14, it is determined that the image quality of segment A1 does not meet the preset conditions, and the second encoding parameter is adjusted to RF=21.

[0144] S15, use the second encoding parameter to encode segment A to obtain segment A2, where VMAF=96.

[0145] S16, determine if the image quality of segment A2 meets the preset conditions.

[0146] S17, extract 2 frames of image per second for segment A, calculate its dense optical flow, and obtain f1, f2, f3 ... fn.

[0147] S18, calculate the optical flow between f1 and f2, f2 and f3, ..., fn-1 and fn respectively, and obtain the optical flow result of segment A.

[0148] S19, according to the optical flow result of segment A, calculate the average optical flow of all pixel points to be 5 pixels / frame, and correspondingly set the preset optical flow threshold to 5 pixels / frame.

[0149] S20, among all pixel points of the segment A image, 50% of the pixel points have an optical flow value greater than 5 pixels / frame in at least one frame of image, and it can be obtained that the dynamic area ratio S1 of segment A is 0.5.

[0150] S21, judge that S1 > S, and there is no need to adjust segment A.

[0151] S22, encode segment B with the preset encoding parameters to obtain segment B1, and the VMAF of segment B1 is 97.

[0152] S23, judge that the image quality of segment B1 meets the preset conditions.

[0153] S24, referring to the steps of S17~S20, obtain the dynamic area ratio S2 of segment B1 to be 0.2.

[0154] S25, judge that S2 < S, and segment B needs to be adjusted.

[0155] S26, adjust according to the preset encoding parameters to obtain the first encoding parameter of segment B as RF1 = 21, and max(RF1, RFmin) = RF1.

[0156] S27, encode segment B with the first encoding parameter to obtain segment B2.

[0157] S28, judge that there is no phenomenon such as image damage in the dynamic area of segment B2.

[0158] S29, encode segment C referring to the steps of S13~S15 to obtain segment C2, and the VMAF of segment C2 is 98.

[0159] S30, judge that the image quality of segment C2 meets the preset conditions.

[0160] S31, referring to the steps of S17~S20, obtain the dynamic area ratio S3 of segment C1 to be 0.1.

[0161] S32, judge that S3 < S, and segment C needs to be adjusted.

[0162] S33, adjust according to the preset encoding parameters to obtain the first encoding parameter of segment C as RF1 = 17, and max(RF1, RFmin) = RFmin.

[0163] S34, using preset minimum encoding parameters to encode segment C, resulting in segment C3.

[0164] S35, determines that there is no image corruption or other issues in the dynamic area of ​​segment C3.

[0165] Example 2

[0166] Figure 9 schematically illustrates a block diagram of a video encoding apparatus according to Embodiment 2 of this application. This apparatus can be divided into one or more program modules. One or more program modules are stored in a storage medium and executed by one or more processors to complete the embodiments of this application. The program modules referred to in the embodiments of this application are a series of computer-readable instruction segments capable of performing specific functions. The following description will specifically introduce the functions of each program module in this embodiment. As shown in Figure 9, the apparatus 1000 may include: a segmentation module 1100, a first determination module 1200, a second determination module 1300, and an encoding module 1400, wherein:

[0167] The segmentation module 1100 is used to segment a video into multiple original video segments, wherein the multiple original video segments include a target video segment;

[0168] The first determining module 1200 is used to determine the dynamic region proportion of the target video segment, wherein the dynamic region proportion is the proportion of the dynamic region in the video frame.

[0169] The second determining module 1300 is used to determine the first encoding parameters of the target video segment when the proportion of the dynamic region is lower than a preset proportion threshold; and

[0170] The encoding module 1400 is used to encode the target video segment according to the first encoding parameters.

[0171] As an optional embodiment, the segmentation module 1100 can also be used for:

[0172] Identify multiple scenes in the video;

[0173] Based on the multiple scenarios, the video is divided into multiple original video segments; wherein, one video segment corresponds to one scenario.

[0174] As an optional embodiment, the first determining module 1200 may also be used for:

[0175] Multiple video frames in the target video segment are determined according to a preset interval;

[0176] Determine the optical flow value of each pixel in each of the video frames;

[0177] The dynamic region of the target video segment is determined based on the optical flow value of each pixel.

[0178] The proportion of the dynamic region of the target video segment is determined based on the dynamic region and the video frame region of the target video segment.

[0179] As an optional embodiment, the device 1000 can also be used for:

[0180] If the optical flow value of a pixel is greater than the preset optical flow threshold, the location of the pixel is determined to be a dynamic region.

[0181] As an optional embodiment, the device 1000 can also be used for:

[0182] The preset optical flow threshold is adjusted based on the optical flow value of each pixel in each video frame.

[0183] As an optional embodiment, the device 1000 can also be used for:

[0184] Each of the original video segments is encoded according to preset encoding parameters to obtain multiple encoded video segments;

[0185] If the image quality of the target encoded video segment does not meet the preset conditions, the preset encoding parameters are adjusted to obtain the second encoding parameters; the target encoded video segment is any one of the plurality of encoded video segments;

[0186] The first original video segment corresponding to the target encoded video segment is encoded according to the second encoding parameters.

[0187] As an optional embodiment, the second determining module 1300 may also be used for:

[0188] When the target video segment is a first original video segment, the first encoding parameters are obtained by adjusting the second encoding parameters.

[0189] If the target video segment is not the first original video segment, the first encoding parameters are obtained by adjusting according to the preset encoding parameters.

[0190] As an optional embodiment, the encoding module 1400 is further configured to:

[0191] If the first encoding parameter is not less than the preset minimum encoding parameter, the target video segment is encoded using the first encoding parameter;

[0192] If the first encoding parameter is less than the preset minimum encoding parameter, the target video segment is encoded using the preset minimum encoding parameter.

[0193] Example 3

[0194] Figure 10 schematically illustrates a hardware architecture diagram of a computer device 10000 suitable for implementing a video encoding method according to Embodiment 3 of this application. In some embodiments, the computer device 10000 may be a smartphone, wearable device, tablet computer, personal computer, vehicle terminal, game console, virtual device, workbench, digital assistant, set-top box, robot, or other terminal device. In other embodiments, the computer device 10000 may be a rack server, blade server, tower server, or cabinet server (including independent servers or server clusters composed of multiple servers). As shown in Figure 10, the computer device 10000 includes, but is not limited to: a memory 10010, a processor 10020, and a network interface 10030 that can communicate and be linked to each other via a system bus. Wherein:

[0195] The memory 10010 includes at least one type of computer-readable storage medium, including flash memory, hard disk, multimedia card, card-type memory (e.g., SD or DX memory), random access memory (RAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 10010 may be an internal storage module of a computer device 10000, such as the hard disk or memory of the computer device 10000. In other embodiments, the memory 10010 may also be an external storage device of the computer device 10000, such as a plug-in hard disk, smart media card (SMC), secure digital (SD) card, flash card, etc., equipped on the computer device 10000. Of course, the memory 10010 may also include both the internal storage module and the external storage device of the computer device 10000. In this embodiment, the memory 10010 is typically used to store the operating system and various application software installed on the computer device 10000, such as program code for video encoding methods. In addition, the memory 10010 can also be used to temporarily store various types of data that have been output or will be output.

[0196] In some embodiments, processor 10020 may be a central processing unit (CPU), controller, microcontroller, microprocessor, or other chip. Processor 10020 is typically used to control the overall operation of computer device 10000, such as performing control and processing related to data interaction or communication with computer device 10000. In this embodiment, processor 10020 is used to run program code stored in memory 10010 or process data.

[0197] Network interface 10030 may include a wireless network interface or a wired network interface, which is typically used to establish a communication link between computer device 10000 and other computer devices. For example, network interface 10030 is used to connect computer device 10000 to an external terminal via a network, establishing a data transmission channel and communication link between computer device 10000 and the external terminal. The network may be an intranet, the Internet, Global System for Mobile Communication (GSM), Wideband Code Division Multiple Access (WCDMA), 4G network, 5G network, Bluetooth, Wi-Fi, or other wireless or wired networks.

[0198] It should be noted that Figure 10 only shows a computer device with components 10010-10030, but it should be understood that it is not required to implement all the components shown, and more or fewer components may be implemented instead.

[0199] In this embodiment, the video encoding method stored in memory 10010 can also be divided into one or more program modules and executed by one or more processors (such as processor 10020) to complete the embodiment of this application.

[0200] Example 4

[0201] This application also provides a computer-readable storage medium storing computer-readable instructions thereon, wherein the computer-readable instructions, when executed by a processor, implement the steps of the video encoding method in the embodiment.

[0202] In this embodiment, the computer-readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (e.g., SD or DX memory), random access memory (RAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the computer-readable storage medium may be an internal storage unit of a computer device, such as the hard disk or memory of the computer device. In other embodiments, the computer-readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, smart media card (SMC), secure digital (SD) card, flash card, etc., equipped on the computer device. Of course, the computer-readable storage medium may include both the internal storage unit and the external storage device of the computer device. In this embodiment, the computer-readable storage medium is typically used to store the operating system and various application software installed on the computer device, such as the program code of the video encoding method in the embodiment. In addition, the computer-readable storage medium can also be used to temporarily store various types of data that have been output or will be output.

[0203] Example 5

[0204] This application also provides a computer program product, including computer-readable instructions that, when executed by a processor, implement the methods described in the above embodiments.

[0205] Obviously, those skilled in the art should understand that the modules or steps of the embodiments of this application described above can be implemented using general-purpose computer devices. They can be centralized on a single computer device or distributed across a network of multiple computer devices. Optionally, they can be implemented using computer-executable program code, thereby storing them in a storage device for execution by a computer device. In some cases, the steps shown or described can be performed in a different order than those presented here, or they can be fabricated as separate integrated circuit modules, or multiple modules or steps can be fabricated as a single integrated circuit module. Thus, the embodiments of this application are not limited to any particular combination of hardware and software.

[0206] It should be noted that the above are merely preferred embodiments of this application and do not limit the scope of patent protection of this application. Any equivalent structural or procedural changes made using the content of this application's specification and drawings, or direct or indirect applications in other related technical fields, are similarly included within the scope of patent protection of this application.

Claims

1. A video encoding method, wherein, The method includes: The video is divided into multiple original video segments, and the multiple original video segments include the target video segment; Determine the dynamic region proportion of the target video segment, whereby the dynamic region proportion is the percentage of the dynamic region in the video frame; When the proportion of the dynamic region is lower than a preset proportion threshold, the first encoding parameters of the target video segment are determined; and The target video segment is encoded according to the first encoding parameters.

2. The method according to claim 1, wherein, The video is divided into multiple original video segments, including: Identify multiple scenes in the video; Based on the multiple scenarios, the video is divided into multiple original video segments; wherein, one original video segment corresponds to one scenario.

3. The method according to claim 1, wherein, Determine the dynamic range of the target video segment, including: Multiple video frames in the target video segment are determined according to a preset interval; Determine the optical flow value of each pixel in each of the video frames; The dynamic region of the target video segment is determined based on the optical flow value of each pixel. The proportion of the dynamic region of the target video segment is determined based on the dynamic region and the video frame region of the target video segment.

4. The method according to claim 3, wherein, Determining the dynamic region of the target video segment based on the optical flow value of each pixel includes: If the optical flow value of a pixel is greater than the preset optical flow threshold, the location of the pixel is determined to be a dynamic region.

5. The method according to claim 4, wherein, The method includes: The preset optical flow threshold is adjusted based on the optical flow value of each pixel in each video frame.

6. The method according to claim 1, wherein, Also includes: Each of the original video segments is encoded according to preset encoding parameters to obtain multiple encoded video segments; If the image quality of the target encoded video segment does not meet the preset conditions, the preset encoding parameters are adjusted to obtain the second encoding parameters; the target encoded video segment is any one of the plurality of encoded video segments; The first original video segment corresponding to the target encoded video segment is encoded according to the second encoding parameters.

7. The method according to claim 6, wherein, Determining the first encoding parameters of the target video segment includes: When the target video segment is a first original video segment, the first encoding parameters are obtained by adjusting the second encoding parameters. If the target video segment is not the first original video segment, the first encoding parameters are obtained by adjusting according to the preset encoding parameters.

8. The method according to any one of claims 1 to 7, wherein, Encoding the target video segment according to the first encoding parameters includes: If the first encoding parameter is not less than the preset minimum encoding parameter, the target video segment is encoded according to the first encoding parameter; If the first encoding parameter is less than the preset minimum encoding parameter, the target video segment is encoded according to the preset minimum encoding parameter.

9. A video encoding apparatus, wherein, The device includes: A segmentation module is used to segment a video into multiple original video segments, wherein the multiple original video segments include a target video segment; The first determining module is used to determine the dynamic region proportion of the target video segment, wherein the dynamic region proportion is the proportion of the dynamic region in the video frame; The second determining module is used to determine the first encoding parameters of the target video segment when the proportion of the dynamic region is lower than a preset proportion threshold; and The encoding module is used to encode the target video segment according to the first encoding parameters.

10. The video encoding apparatus according to claim 9, wherein, The video is divided into multiple original video segments, including: Identify multiple scenes in the video; Based on the multiple scenarios, the video is divided into multiple original video segments; wherein, one original video segment corresponds to one scenario.

11. A computer device, wherein, include: At least one processor; and A memory communicatively connected to the at least one processor; wherein: The memory stores instructions that can be executed by the at least one processor, which, when executed by the at least one processor, enable the at least one processor to perform the following operations: The video is divided into multiple original video segments, and the multiple original video segments include the target video segment; Determine the dynamic region proportion of the target video segment, whereby the dynamic region proportion is the percentage of the dynamic region in the video frame; When the proportion of the dynamic region is lower than a preset proportion threshold, the first encoding parameters of the target video segment are determined; and The target video segment is encoded according to the first encoding parameters.

12. The computer device according to claim 11, wherein, The video is divided into multiple original video segments, including: Identify multiple scenes in the video; Based on the multiple scenarios, the video is divided into multiple original video segments; wherein, one original video segment corresponds to one scenario.

13. The computer device according to claim 11, wherein, Determine the dynamic range of the target video segment, including: Multiple video frames in the target video segment are determined according to a preset interval; Determine the optical flow value of each pixel in each of the video frames; The dynamic region of the target video segment is determined based on the optical flow value of each pixel. The proportion of the dynamic region of the target video segment is determined based on the dynamic region and the video frame region of the target video segment.

14. The computer device according to claim 11, wherein, Determining the dynamic region of the target video segment based on the optical flow value of each pixel includes: If the optical flow value of a pixel is greater than the preset optical flow threshold, the location of the pixel is determined to be a dynamic region.

15. The computer device according to claim 14, wherein, The method includes: The preset optical flow threshold is adjusted based on the optical flow value of each pixel in each video frame.

16. The computer device according to claim 11, wherein, The at least one processor is also capable of performing the following operations: Each of the original video segments is encoded according to preset encoding parameters to obtain multiple encoded video segments; If the image quality of the target encoded video segment does not meet the preset conditions, the preset encoding parameters are adjusted to obtain the second encoding parameters; the target encoded video segment is any one of the plurality of encoded video segments; The first original video segment corresponding to the target encoded video segment is encoded according to the second encoding parameters.

17. The computer device according to claim 16, wherein, Determining the first encoding parameters of the target video segment includes: When the target video segment is a first original video segment, the first encoding parameters are obtained by adjusting the second encoding parameters. If the target video segment is not the first original video segment, the first encoding parameters are obtained by adjusting according to the preset encoding parameters.

18. The computer device according to any one of claims 11 to 17, wherein, Encoding the target video segment according to the first encoding parameters includes: If the first encoding parameter is not less than the preset minimum encoding parameter, the target video segment is encoded according to the first encoding parameter; If the first encoding parameter is less than the preset minimum encoding parameter, the target video segment is encoded according to the preset minimum encoding parameter.

19. A computer-readable storage medium, wherein, The computer-readable storage medium stores computer instructions that, when executed by a processor, implement the method as described in any one of claims 1 to 8.

20. A computer program product comprising computer-readable instructions, wherein, When executed by a processor, the computer-readable instructions implement the steps of the method according to any one of claims 1 to 8.