Method, medium, and computer program product for testing a video decoder

By using feature operators and global-local analysis methods in video decoder testing, the false alarm problem of traditional methods in heterogeneous environments is solved, achieving efficient and accurate decoding output verification, and adapting to different test scenarios and hardware platforms.

CN121842376BActive Publication Date: 2026-06-26MOXIN ARTIFICIAL INTELLIGENCE TECH (SHENZHEN) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
MOXIN ARTIFICIAL INTELLIGENCE TECH (SHENZHEN) CO LTD
Filing Date
2026-03-13
Publication Date
2026-06-26

Smart Images

  • Figure CN121842376B_ABST
    Figure CN121842376B_ABST
Patent Text Reader

Abstract

The present application provides a method, medium and computer program product for testing a video decoder. A method for testing a video decoder includes determining, based on codec parameter information corresponding to an encoded video stream input to a video decoder under test and test configuration information, a feature operator for verifying correctness of a decoded output generated by the video decoder under test in decoding the encoded video stream; obtaining a reference decoded output corresponding to the encoded video stream, the reference decoded output determined based on a consensus verification of a plurality of candidate decoded outputs generated by a plurality of video decoders in decoding the encoded video stream; extracting, based on the feature operator, a test feature of an image of the decoded output and a reference feature of an image of the reference decoded output; and comparing the test feature and the reference feature to determine whether the decoded output is correct, the comparison based on a global structural similarity analysis and a local noise tolerance analysis.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of video processing and quality testing, and more specifically, to methods for testing video decoders, non-transitory computer-readable media, and computer program products. Background Technology

[0002] Video encoding and decoding technology is the core of multimedia processing, and the correctness of video codecs directly affects the quality of video processing and user experience. In the research, development, testing, and integration of video codecs, how to efficiently and accurately verify the correctness of their decoding output is a long-standing and increasingly serious technical challenge. Summary of the Invention

[0003] In one aspect, this application discloses a method for testing a video decoder, comprising: determining a feature operator for verifying the correctness of a decoded output under test based on encoding / decoding parameter information and test configuration information corresponding to an encoded video stream input to the video decoder under test, wherein the decoded output under test is generated by the video decoder under test decoding the encoded video stream; obtaining a reference decoded output corresponding to the encoded video stream, wherein the reference decoded output is determined by consensus verification based on multiple candidate decoded outputs generated by multiple video decoders decoding the encoded video stream; extracting test features of the image of the decoded output under test and reference features of the image of the reference decoded output based on the feature operator; and comparing the test features with the reference features to determine whether the decoded output under test is correct, wherein the comparison is based on global structural similarity analysis for detecting structural errors and local noise tolerance analysis for detecting local noise.

[0004] In other respects, this application discloses a non-transitory computer-readable medium storing instructions and a computer program product including instructions. These instructions, when executed by one or more processors, cause the processors to perform the methods described in this application. Attached Figure Description

[0005] When read in conjunction with the accompanying drawings, various aspects of this disclosure are best understood through the following detailed description. It should be noted that, in accordance with standard practice in the art, the features are not drawn to scale. In fact, for clarity of discussion, the dimensions of the features may be arbitrarily increased or decreased.

[0006] Figure 1 A schematic diagram of a method for testing a video decoder according to an embodiment of this application is shown.

[0007] Figure 2 A schematic diagram of a computing device that can implement embodiments of this application is shown. Detailed Implementation

[0008] The following disclosure provides numerous different embodiments or examples for implementing various features of the provided subject matter. Specific examples of components and arrangements are described below to simplify this disclosure. Of course, these are merely examples and not limiting.

[0009] In the research, development, testing, and integration of video codecs, verifying the correctness of the decoder's output is a fundamental and crucial task. Traditional methods for verifying video decoder output typically involve pixel-level comparisons, such as using Peak Signal-to-Noise Ratio (PSNR) or Structural Similarity Index (SSIM). However, these traditional verification methods often face the following challenges and problems in practical applications, especially in various heterogeneous decoding environments:

[0010] First, it easily generates false positives due to compliance discrepancies. Testing video decoders involves a wide variety of test scenarios, which may involve different video codec formats (e.g., H.264, H.265, AV1, etc.), resolutions, frame rates, bitstream formats, compression ratios, and dynamic ranges. Furthermore, running video decoders involves a variety of hardware implementation platforms (e.g., CPU, GPU, NPU, etc.). Different test scenarios or different hardware implementation platforms can cause subtle but fully compliant differences in video decoding output, such as rounding errors and color space conversions. Traditional verification methods (e.g., PSNR, SSIM, etc.) using pixel-level comparisons cannot effectively distinguish these compliance differences from real decoding errors, resulting in an extremely high false positive rate, which seriously affects testing efficiency and the accuracy of conclusions.

[0011] Second, it is difficult to adapt to different test scenarios or different hardware implementation platforms. Different test scenarios or different hardware implementation platforms have different sensitivities to image features. The single, fixed feature extraction algorithms used by traditional verification methods (such as PSNR, SSIM, etc.) are difficult to adapt to all test scenarios and hardware implementation platforms.

[0012] Third, there is a lack of a unified gold standard. If the output of a specific decoder (e.g., a standard software reference decoder) is used as the sole fixed benchmark, normal compliance differences are easily misjudged as errors, resulting in false alarms. Manually verifying and maintaining an absolutely correct benchmark library for a massive number of test cases would be extremely costly and impractical.

[0013] To address the aforementioned issues, this paper discloses various implementation methods for testing video decoders. More specifically, it provides methods for verifying the correctness of the decoding output of a video decoder. This method can distinguish between compliance differences and genuine decoding errors, automatically adapt to different test scenarios or hardware implementation platforms, and dynamically establish reliable verification benchmarks, thereby significantly improving testing efficiency and accuracy.

[0014] Figure 1 A schematic diagram of a method 100 for testing a video decoder according to an embodiment of this application is shown. Figure 1 As shown, the method 100 may include steps S102, S104, S106 and S108.

[0015] In step S102, based on the encoding / decoding parameter information and test configuration information corresponding to the encoded video stream input to the video decoder under test, a feature operator is determined to verify the correctness of the decoder output under test, wherein the decoder output under test is generated by the video decoder under test decoding the encoded video stream.

[0016] Encoding and decoding parameter information can reflect and characterize the technical specifications and encoding and decoding characteristics associated with the encoded video stream. In some embodiments, the encoding and decoding parameter information may include one or more of the following: video encoding and decoding format, resolution, frame rate, bitstream format, compression ratio, and dynamic range.

[0017] Video codec format refers to the standards and syntax followed by video compression, such as H.264 / AVC, H.265 / HEVC, AV1, VP9, ​​etc. Resolution refers to the pixel size of the video image, such as 1920x1080 (Full HD), 3840x2160 (4K), 7680x4320 (8K), etc. Frame rate refers to the number of frames displayed per second, such as 24 frames per second (fps), 30fps, 60fps, 120fps, etc. Bitstream format refers to the encapsulation or syntax structure format of the encoded video stream, such as MPEG-2 transport stream (TS), MP4 file format, or raw H.264 / H.265 elementary stream (ES), etc.

[0018] Compression ratio characterizes the degree of compression in a encoded video stream (e.g., low compression ratio, high compression ratio, etc.). Compression ratio is one of the most critical factors affecting the image quality of a decoded video stream. In high compression ratio scenarios, the quantization distortion introduced by the encoding is more significant, easily leading to block artifacts or texture blurring. Therefore, as an example, in high compression ratio scenarios, the monitoring of "block artifact" features can be enhanced by selecting feature operators targeting block boundary discontinuities (e.g., block artifact detection features) to verify the correctness of the decoded output under test. As another example, in high compression ratio scenarios, feature operators sensitive to block boundary discontinuities and texture distortion (e.g., gradient-based block artifact detection features, discrete cosine transform domain features, etc.) can be selected to verify the correctness of the decoded output under test.

[0019] Dynamic range refers to the range of brightness that a video can represent, and is generally divided into Standard Dynamic Range (SDR) and High Dynamic Range (HDR). HDR videos typically have a wider brightness range and richer colors (e.g., using the BT.2020 color gamut). Therefore, as an example, for HDR scenes, the monitoring of color bit depth shift can be enhanced, and feature operators that can effectively evaluate brightness distribution, color volume, and the retention of highlight or shadow details (e.g., perceptual color difference features or tone mapping distortion evaluation features for HDR) can be selected to verify the correctness of the decoded output under test. As an example, for HDR scenes, the monitoring of color bit depth shift can be enhanced, and an extended color histogram (e.g., supporting wide color gamut and deep bit depth) can be selected as a feature operator to verify the correctness of the decoded output under test.

[0020] Test configuration information can describe information related to the environment in which the test is performed. In some embodiments, test configuration information may include one or more of the following: the type of hardware platform running the video decoder under test, and the operating mode of the video decoder under test. The type of hardware platform may include, for example, a CPU, GPU, and NPU. The operating mode of the video decoder under test may include, for example, a high-performance mode, a high-precision mode, a low-power mode, etc.

[0021] Those skilled in the art will understand that the above description of the encoding / decoding parameter information and test configuration information is merely an exemplary description for ease of explanation and is not intended to be limiting. In practical applications, any other suitable type of encoding / decoding parameter information and test configuration information may exist, and this application does not impose any limitations on this.

[0022] Based on relevant encoding / decoding parameter information and test configuration information, one or more feature operators can be determined to verify the correctness of the decoded output under test, wherein the decoded output under test is a decoded video stream generated by the video decoder under test decoding the encoded video stream. In some embodiments, the feature operators may include one or more of the following: edge features, color histograms, deep learning features, and block effect detection features.

[0023] As an example, based on the high compression ratio information contained in the codec parameter information, feature operators can include block artifact detection features. As an example, based on the high compression ratio information contained in the codec parameter information, feature operators can include feature operators sensitive to block boundary discontinuities and texture distortion, such as gradient-based block artifact detection features, discrete cosine transform domain features, etc. As an example, based on the high dynamic range information contained in the codec parameter information, feature operators can include feature operators capable of effectively evaluating brightness distribution, color volume, and highlight or shadow detail retention, such as perceptual color difference features or tone mapping distortion evaluation features for HDR. As an example, based on the high dynamic range information contained in the codec parameter information, feature operators can include extended color histograms (e.g., supporting wide color gamut and deep bit depth).

[0024] Those skilled in the art will understand that the above description of feature operators is merely illustrative for ease of explanation and is not intended to be limiting. In practical applications, any other suitable type of feature operator or various combinations of feature operators can be selected as needed, and this application does not impose any limitations on this.

[0025] In step S104, a reference decoded output corresponding to the encoded video stream is obtained.

[0026] In some embodiments, obtaining a reference decoded output corresponding to the encoded video stream may include: querying a database to see if a baseline decoded output corresponding to the encoded video stream exists; if a baseline decoded output corresponding to the encoded video stream exists, reading the baseline decoded output as the reference decoded output corresponding to the encoded video stream; and if no baseline decoded output corresponding to the encoded video stream exists, creating a baseline decoded output corresponding to the encoded video stream and storing the baseline decoded output in the database. In some embodiments, the baseline decoded output (or reference decoded output) may be created by means of expert system verification or multi-decoder consensus verification.

[0027] In some embodiments, the baseline decoded output or reference decoded output may be determined through consensus verification based on multiple candidate decoded outputs generated by decoding the encoded video stream by multiple video decoders. In some embodiments, consensus verification may include: comparing features extracted from the images of the multiple candidate decoded outputs with a preset baseline to determine a set of target decoded outputs from the multiple candidate decoded outputs, wherein the difference between the extracted features of each candidate decoded output in the set of target decoded outputs and the preset baseline is within a preset range; and selecting a decoded output from the set of target decoded outputs as a reference decoded output.

[0028] As an example, multiple video decoders may include: a recognized software reference decoder (e.g., HM for H.265), and several different but reliable mainstream decoders (e.g., corresponding decoding libraries for FFmpeg, and hardware decoding drivers for mainstream GPUs, etc.). These multiple video decoders can decode the encoded video stream separately to generate multiple candidate decoded outputs (i.e., multiple decoded video streams or decoded image sequences). Features are then extracted from each candidate decoded output image. The extracted features can be based on the feature operators determined in step S102, or on a dedicated set of feature operators used for consensus verification. The preset benchmark can be a pre-set fixed value, or a value calculated based on the features extracted from each candidate decoded output image. For example, this value can be the statistical center (e.g., mean, median, etc.) of the extracted features from all candidate decoded outputs. Subsequently, the extracted features of each candidate decoded output can be compared with the preset benchmark to calculate feature differences, thereby determining a set of target decoded outputs from the multiple candidate decoded outputs. The difference between the extracted features of each candidate decoded output in this set of target decoded outputs and the preset benchmark is within a preset range (e.g., a preset percentage or a preset range interval). Finally, a decoding output can be selected from the set of target decoding outputs as a reference decoding output.

[0029] In step S106, based on the feature operator, the features to be tested of the image to be decoded and the reference features of the image to be decoded are extracted.

[0030] In step S108, the extracted features to be tested are compared with reference features to determine whether the decoded output to be tested is correct. The comparison is based on global structural similarity analysis for detecting structural errors and local noise tolerance analysis for detecting local noise.

[0031] Global structural similarity analysis (GSM) for detecting structural errors can be used to determine whether the image of the decoded output under test has undergone significant distortion or destruction in aspects such as overall structure, contours, and major texture layout. In some embodiments, GSM can be based on one or more of the following: Multi-Scale Structural Similarity Index (MS-SSIM) and Gradient Magnitude Similarity Deviation (GMSD). As an example, comparing the extracted features to be tested with reference features based on GSM may include: calculating a global structural similarity score by comparing the features to be tested with reference features based on metrics such as MS-SSIM or GMSD (these metrics are highly sensitive to structural errors such as edge breaks, large-area color block errors, and object deformation). A global structural similarity score below a threshold indicates the presence of a structural error, thereby determining that the decoded output under test is incorrect.

[0032] After global structural similarity analysis for detecting structural errors, local noise tolerance analysis for detecting local noise can be used to further determine whether local noise exists in the image of the decoded output under test. This local noise tolerance analysis can filter out compliance differences. In some embodiments, local noise tolerance analysis can be based on one or more of the following: spatial distribution test and statistical model fit test. As an example, comparing the extracted features to be tested with reference features based on local noise tolerance analysis for detecting local noise can include: analyzing the statistical properties of the differences between the features to be tested and the reference features to determine whether local noise exists in addition to compliance differences. If the statistical properties of the differences conform to the expected, compliant noise pattern (e.g., rounding errors for a specific hardware platform), then the differences are determined to be compliance differences and not true local noise, thus the decoded output under test can be determined to be correct. As an example, the differences between the features to be tested and the reference features in various dimensions can be calculated to form a difference matrix. This difference matrix is ​​then statistically analyzed to determine whether local noise exists. For example, the statistical analysis can use a spatial distribution test to check whether the differences are randomly distributed in the image space or concentrated and regular. Compliance discrepancies are typically randomly distributed. For example, statistical analysis can use a statistical model to fit the histogram of discrepancies to a pre-defined "compliance noise model" (e.g., a zero-mean Gaussian distribution, a fixed-point error distribution model specific to a particular hardware platform, etc.). If the discrepancy distribution deviates significantly from the model, non-compliance discrepancies, i.e., local noise, can be identified, thus confirming that the decoded output under test is incorrect. It should be noted that the rules used for spatial distribution testing or the significance level threshold used for fit testing can be adaptively adjusted based on encoding / decoding parameter information (e.g., compression ratio, etc.), and the compliance noise model can be a general model matched and loaded from a pre-built model library based on the hardware platform type and operating mode in the test configuration information, or a specialized model generated by learning from the platform's historical decoded output data.

[0033] After global structural similarity analysis, if the global structural similarity score indicates a structural error, the output of the decoder under test is determined to be incorrect. If, after global structural similarity analysis, the global structural similarity score indicates no structural error, and further after local noise tolerance analysis, if non-compliance differences (i.e., local noise) are identified, the output of the decoder under test is determined to be incorrect; if compliance differences are identified, the output of the decoder under test is determined to be correct.

[0034] In some embodiments, method 100 may further include: if it is determined that the decoded output to be tested is incorrect, generating a difference report indicating the location and type of the error. In some embodiments, the difference report may include a difference feature heatmap to locate the error to a macroblock (MB) or slice level in the image.

[0035] Embodiments of this application may include a non-transitory computer-readable medium. The medium stores instructions that, when executed by one or more processors, cause the processors to perform the methods described herein.

[0036] Embodiments of this application may also include a computer program product. This computer program product includes instructions that, when executed by one or more processors, cause the processors to perform the methods described herein.

[0037] The method for testing video decoders according to embodiments of this application can distinguish between compliance differences and real decoding errors, can automatically adapt to different test scenarios or different hardware implementation platforms, and can dynamically establish reliable verification benchmarks, thereby significantly improving testing efficiency and accuracy.

[0038] Figure 2 A schematic diagram is shown of a computing device 200 that can implement embodiments according to this application. This computing device 200 can be used to perform the various methods described above in conjunction with embodiments of this application, for example, in conjunction with... Figure 1 The method described. For example... Figure 2 As shown, computing device 200 may include bus 202 or other communication mechanism for transmitting information, and one or more processors 204 coupled to bus 202 for processing information. The one or more processors 204 may include, for example, one or more general-purpose microprocessors.

[0039] like Figure 2As shown, in some embodiments, computing device 200 may further include main memory 206 coupled to bus 202, which is used to store information and instructions executed by one or more processors 204. For example, main memory 206 includes, but is not limited to, random access memory (RAM), cache, and / or other dynamic storage devices. Main memory 206 may also be used to store temporary variables or other intermediate information during the execution of instructions executed by one or more processors 204. When these instructions are stored in a storage medium accessible to one or more processors 204, they can cause computing device 200 to become a dedicated machine customized to perform the operations specified in the instructions. Storage device 208 may include non-volatile and / or volatile storage media. Non-volatile storage media may include, for example, optical disks or magnetic disks. Volatile storage media may include dynamic memory. Common forms of storage media may include, for example, floppy disks, hard disks, solid-state drives, magnetic tapes, or any other magnetic data storage media, CD-ROMs, any other optical data storage media, any physical media with a perforated pattern, RAM, DRAM, PROM, EPROM, FLASH-EPROM, NVRAM, any other memory chip or cartridge, or their networking versions.

[0040] like Figure 2 As shown, in some embodiments, computing device 200 may further include one or more communication interfaces or network interfaces 210 coupled to bus 202. Network interface 210 may provide bidirectional data communication coupling to one or more network links connected to one or more networks. As another example, network interface 210 may be a local area network (LAN) card to provide data communication connectivity to a LAN-compatible (or WAN component communicating with a WAN) network. Wireless links may also be implemented.

[0041] The various processes, methods, and algorithms described in the preceding sections can be embodied in code modules executed by one or more computer systems or computer processors including computer hardware, and can be fully or partially automated by these code modules. The processes and algorithms can be implemented, partially or fully, in dedicated circuit systems.

[0042] When the functions disclosed herein are implemented as software functional units and sold or used as standalone products, they may be stored in a processor-executable, non-volatile, computer-readable storage medium. Specific technical solutions (all or part) disclosed herein, or aspects contributing to the prior art, may be embodied in the form of a software product. The software product may be stored in a storage medium and includes instructions to cause a computing device (which may be a personal computer, server, network device, etc.) to perform all or some steps of the methods of the embodiments of this application. The storage medium may include a flash drive, hard disk drive, ROM, RAM, magnetic disk, optical disk, other media operable to store program code, or any combination thereof.

[0043] Some embodiments further provide a system including a processor and a non-transitory computer-readable storage medium storing instructions executable by the processor to cause the system to perform operations corresponding to steps in any method of the embodiments disclosed above. Specific embodiments further provide a non-transitory computer-readable storage medium storing instructions executable by one or more processors to cause the one or more processors to perform operations corresponding to steps in any method of the embodiments disclosed above.

[0044] The embodiments disclosed herein can be implemented via a cloud platform, server, or server cluster (collectively referred to as the “service system”) that interacts with a client. The client can be a terminal device or a client registered by a user at the platform, wherein the terminal device can be a mobile terminal, a personal computer (PC), or any device capable of installing platform applications.

[0045] The various features and processes described above can be used independently of each other or combined in various ways. All possible combinations and sub-combinations should be considered to fall within the scope of this disclosure. Additionally, certain methods or processes may be omitted in some embodiments. The methods and processes described herein are not limited to any particular order, and the blocks or states associated with them may be executed in other suitable orders. For example, the described blocks or states may be executed in an order other than that specifically disclosed, or multiple blocks or states may be combined into a single block or state. Example blocks or states may be executed sequentially, in parallel, or in some other manner. Blocks or states may be added to or removed from the disclosed example embodiments. The exemplary systems and components described herein may be configured differently than described. For example, components may be added to, removed from, or rearranged compared to the disclosed example embodiments.

[0046] The various operations of the exemplary methods described herein can be performed at least in part by an algorithm. The algorithm may be included in program code or instructions stored in memory (e.g., the aforementioned non-transitory computer-readable storage medium). The algorithm may include a machine learning algorithm. In some embodiments, the machine learning algorithm may not explicitly turn the computer into an executable function but may learn from training data to produce a predictive model of the executable function.

[0047] The various operations of the exemplary methods described herein can be performed, at least in part, by one or more processors that are temporarily configured (e.g., by software) or permanently configured to perform the relevant operations. Whether temporarily or permanently configured, these processors can constitute an engine of processor implementations that operate to perform one or more of the operations or functions described herein.

[0048] Similarly, the methods described herein can be implemented at least in part by a processor, where one or more specific processors are instances of hardware. For example, at least some operations of the methods can be performed by one or more processors or an engine implemented by a processor. Furthermore, one or more processors can also operate to support the execution of related operations in a “cloud computing” environment or as the execution of related operations in a “Software as a Service” (SaaS) context. For example, at least some operations can be performed by a group of computers (as an example of a machine containing processors), where these operations are accessible via a network (e.g., the Internet) and via one or more appropriate interfaces (e.g., application programming interfaces (APIs)).

[0049] The execution of certain operations can be distributed across processors rather than residing within a single machine, and can be deployed across multiple machines. In some example embodiments, the processor or processor-implemented engine may reside in a single geographic location (e.g., in a home environment, office environment, or server farm). In some embodiments, the processor or processor-implemented engine may be distributed across multiple geographic locations.

[0050] Throughout this specification, multiple examples may be implemented as components, operations, or structures of a single example. Although individual operations of one or more methods are illustrated and described as separate operations, one or more of these individual operations may be performed simultaneously, and not necessarily in the order illustrated. Structures and functions presented as separate components in the example configurations may be implemented as composite structures or components. Similarly, structures and functions presented as single components may be implemented as separate components. These and other variations, modifications, additions, and improvements also fall within the scope of this document.

[0051] As used herein, "or" is inclusive rather than exclusive unless explicitly indicated by the context. Therefore, in this document, "A, B, or C" means "A, B, A and B, A and C, B and C, or A, B, and C" unless explicitly indicated by the context. Furthermore, "and" is combined and separate unless explicitly indicated by the context. Therefore, in this document, "A and B" means "A and B, combined or separate" unless explicitly indicated by the context. Additionally, multiple instances of resources, operations, or structures described herein may be provided as a single instance. Furthermore, the boundaries between various resources, operations, engines, and data storage devices are somewhat arbitrary and specific operations are illustrated within the context of a particular illustrative configuration. Other functional assignments are foreseeable and fall within the scope of various embodiments of this disclosure. Generally, structures and functions presented as individual resources in example configurations may be implemented as combined structures or resources. Similarly, structures and functions presented as single resources may be implemented as single resources. These and other changes, modifications, additions, and improvements fall within the scope of the embodiments of this disclosure as expressed in the appended claims. Therefore, this specification and drawings should be considered illustrative rather than restrictive.

[0052] The terms “comprising” or “including” are used to indicate the presence of a subsequently claimed feature, but do not preclude the addition of other features. Unless otherwise specifically stated or otherwise understood in the context in which they are used, conditional language such as “may,” “can,” “may,” and “can” is generally intended to convey that certain embodiments include certain features, components, and / or steps that are not included in other embodiments. Therefore, this conditional language is generally not intended to imply that one or more embodiments require features, components, and / or steps in any way, or that one or more embodiments must include logic for determining whether such features, components, and / or steps are included in or performed in any particular embodiment, with or without user input or prompts.

[0053] Although the general outline of the subject matter has been described with reference to specific exemplary embodiments, various modifications and changes may be made to these embodiments without departing from the broad scope of embodiments of this disclosure. Where more than one embodiment is disclosed, these embodiments of the subject matter may be referred to individually or collectively herein as the term "invention," this is for convenience only and is not intended to automatically limit the scope of this application to any single disclosure or concept.

[0054] The embodiments illustrated herein are described in detail to enable those skilled in the art to practice the disclosed teachings. Other embodiments may be used and derived therefrom, such that structural and logical substitutions and changes may be made without departing from the scope of this disclosure. Therefore, “implementation” is not intended to be limiting, and the scope of the various embodiments is defined only by the appended claims and their full scope.

Claims

1. A method for testing a video decoder, characterized in that, include: Based on the encoding / decoding parameter information and test configuration information corresponding to the encoded video stream input to the video decoder under test, feature operators are determined to verify the correctness of the decoder output under test, wherein the decoder output under test is generated by the video decoder under test decoding the encoded video stream; A reference decoded output corresponding to the encoded video stream is obtained, wherein the reference decoded output is determined by consensus verification based on multiple candidate decoded outputs generated by multiple video decoders decoding the encoded video stream, wherein the consensus verification includes: comparing features extracted from the images of the multiple candidate decoded outputs with a preset benchmark to determine a set of target decoded outputs from the multiple candidate decoded outputs, wherein the difference between the extracted features of each candidate decoded output in the set of target decoded outputs and the preset benchmark is within a preset range; and selecting a decoded output from the set of target decoded outputs as the reference decoded output; Based on the feature operator, extract the features to be tested from the image to be decoded and the reference features from the image to be decoded; and The feature to be tested is compared with the reference feature to determine whether the decoded output to be tested is correct, wherein the comparison is based on global structural similarity analysis for detecting structural errors and local noise tolerance analysis for detecting local noise.

2. The method according to claim 1, characterized in that, in, The encoding / decoding parameter information includes one or more of the following: video encoding / decoding format, resolution, frame rate, bitstream format, compression rate, and dynamic range.

3. The method according to claim 1, characterized in that, in, The test configuration information includes one or more of the following: the type of hardware platform running the video decoder under test, and the operating mode of the video decoder under test.

4. The method according to claim 1, characterized in that, in, The feature operators include one or more of the following: edge features, color histograms, deep learning features, and block effect detection features.

5. The method according to claim 1, characterized in that, in, Obtaining the reference decoded output corresponding to the encoded video stream includes: Query the database to see if a reference decoder output corresponding to the encoded video stream exists; If it exists, then read the reference decoding output as the reference decoding output; and If it does not exist, the baseline decoding output is created based on the multiple candidate decoding outputs and through the consensus verification to serve as the reference decoding output, and the baseline decoding output is stored in the database.

6. The method according to claim 1, characterized in that, Also includes: If the decoded output under test is determined to be incorrect, a difference report indicating the location and type of error is generated.

7. The method according to claim 6, characterized in that, in, The difference report includes a difference feature heatmap to pinpoint errors to macroblock or stripe levels within the image.

8. The method according to claim 1, characterized in that, in, The global structural similarity analysis is based on one or more of the following: multi-scale structural similarity index and gradient magnitude similarity deviation.

9. The method according to claim 1, characterized in that, in, The local noise tolerance analysis is based on one or more of the following: spatial distribution test and statistical model fit test.

10. A non-transitory computer-readable medium storing instructions, characterized in that, When executed by one or more processors, the instructions cause the one or more processors to perform the method according to any one of claims 1-9.

11. A computer program product, comprising instructions, characterized in that, When executed by one or more processors, the instructions cause the one or more processors to perform the method according to any one of claims 1-9.