Loop filtering method and apparatus, computer device and readable storage medium

By dividing the decoded macroblock into multiple target units and mapping their addresses, the on-chip cache of the loop filter module in the hardware video decoder is optimized, solving the problem of limited execution performance of the loop filter module in the prior art. This enables the reading and writing of filter boundary data to be completed within one clock cycle, thereby improving the execution efficiency of the hardware video decoder.

CN122269050APending Publication Date: 2026-06-23GLENFLY TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
GLENFLY TECH CO LTD
Filing Date
2026-04-23
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

The on-chip cache (SRAM) of the loop filter module in existing hardware video decoders only supports single read and write operations, which limits the performance of loop filtering and makes it impossible to complete the read and write operations of filter boundary data within one clock cycle.

Method used

By dividing the decoded macroblock into multiple target units and mapping addresses based on associated attribute information, parallel read and write operations of N target units are achieved each time, optimizing the address mapping relationship of the on-chip cache, so as to complete the reading and writing of filtered data at vertical or horizontal boundaries within one clock cycle.

Benefits of technology

The filtering performance of the loop filtering module has been improved, enabling the reading and writing of filtering boundary data to be completed within one clock cycle, thereby enhancing the execution efficiency of the hardware video decoder.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122269050A_ABST
    Figure CN122269050A_ABST
Patent Text Reader

Abstract

This application relates to a loop filtering method, apparatus, computer device, and readable storage medium. The method includes: when reading reconstructed pixel data from an on-chip cache or writing filtered pixel data to an on-chip cache, processing N target units of a decoded macroblock each time; obtaining the associated attribute information of the target sub-blocks to which the N target units belong, including first position information, first size information, and color component information of the target decoding tree block to which the target sub-block belongs, second position information of the target decoding macroblock to which the target sub-block belongs in the target decoding tree block, and third position information of the target sub-block in the target decoding macroblock; performing address mapping based on the associated attribute information to obtain the target position information corresponding to the target sub-blocks to which the N target units belong in the on-chip cache, and then performing the reading of reconstructed pixel data or the writing of filtered pixel data. This method can improve the performance of loop filtering.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of video encoding and decoding technology, and in particular to a loop filtering method, apparatus, computer equipment, computer-readable storage medium, and computer program product. Background Technology

[0002] The loop filtering module is a decoding submodule of the hardware video decoder. Its main function is to perform filtering operations on the reconstructed pixel data to eliminate blockiness and ringing artifacts in the decoded image, thereby improving the image quality. The hardware design of the on-chip cache (Static Random Access Memory, SRAM) of the loop filtering module in the hardware video decoder has a crucial impact on the hardware resources and decoding performance of the hardware decoder.

[0003] In related technologies, the on-chip cache (SRAM) of the loop filter module only supports single read and write operations. The filter data at the loop filter boundary needs to be read and written over multiple clock cycles, which greatly affects the performance of the loop filter. Summary of the Invention

[0004] Therefore, it is necessary to provide a loop filtering method, apparatus, computer equipment, computer-readable storage medium, and computer program product that can improve the performance of loop filtering in order to address the above-mentioned technical problems.

[0005] Firstly, this application provides a loop filtering method, including:

[0006] When reading reconstructed pixel data from the on-chip cache or writing filtered pixel data to the on-chip cache, N target units of a decoded macroblock are processed each time; wherein, each decoded tree block can be divided into M decoded macroblocks, a decoded macroblock can be divided into N sub-blocks, each sub-block can be divided into N target units, the N target units belong to the same sub-block, and each sub-block has a different address in the on-chip cache;

[0007] Obtain the associated attribute information of the target sub-blocks to which the N target units belong; the associated attribute information includes the first position information, the first size information, and the color component information of the target decoding tree block to which the target sub-block belongs, the second position information of the target decoding macroblock to which the target sub-block belongs in the target decoding tree block, and the third position information of the target sub-block in the target decoding macroblock; the first position information is the coordinate of the target decoding tree block in the width direction of the decoded image;

[0008] Address mapping is performed based on the associated attribute information to obtain the target location information corresponding to the target sub-blocks to which the N target units belong in the on-chip cache. The reconstructed pixel data is read or the filtered pixel data is written based on the target location information.

[0009] In one embodiment, the number N of target units processed each time is 4; the address of the target unit divided from each decoded macroblock in the on-chip cache satisfies the following read / write request conditions:

[0010] The N target units on the horizontal side of the decoded macroblock have different mapping addresses in their respective subblocks, so that the filtered data at the vertical filtering boundary can be read and written within one clock cycle.

[0011] The N target units in the vertical direction of the decoded macroblock have different mapping addresses in their respective subblocks, so that the filtered data at the horizontal boundary can be read and written within one clock cycle.

[0012] The target cells of 2×2 have different mapped addresses in their respective sub-blocks so that they can be read and written within one clock cycle;

[0013] Furthermore, the mapping addresses of each target unit in the N sub-blocks within the rectangle of the decoded macroblock size obtained by diagonally moving 2×2 target units are the same as the mapping addresses of the target units in the N sub-blocks of the decoded macroblock at the starting position.

[0014] In one embodiment, the step of performing address mapping based on the associated attribute information to obtain the target location information corresponding to the target sub-blocks to which the N target units belong in the on-chip cache includes:

[0015] Based on the color component information, determine the fourth position information of the target decoding macroblock in the on-chip cache and the fifth position information of each sub-block in the target decoding macroblock in the on-chip cache;

[0016] Based on the first position information, the first size information, the color component information, and the fourth position information of the target decoding tree block, the address of the target decoding macroblock cached on the chip is obtained;

[0017] Based on the address of the target decoding macroblock cached on the chip and the fifth position information of each sub-block, the target position information corresponding to the target sub-blocks to which the N target units belong are obtained in the on-chip cache.

[0018] In one embodiment, determining the fourth position information of the target decoding macroblock based on the color component information includes:

[0019] When the color component information is a luminance component, the fourth position information is the same as the second position information;

[0020] When the color component information is a chromaticity component, the second location information is remapped to obtain the fourth location information.

[0021] In one embodiment, determining the fifth location information of each sub-block in the target decoding macroblock in the on-chip cache based on the color component information includes:

[0022] When the color component information is a luminance component, the fifth position information is the same as the third position information;

[0023] When the color component information is the chromaticity component Cb, the fifth position information is determined using the first address mapping relationship;

[0024] When the color component information is the chromaticity component Cr, the fifth location information is determined using the second address mapping relationship.

[0025] In one embodiment, the step of reading the reconstructed pixel data or writing the filtered pixel data based on the target location information includes:

[0026] Based on the target location information, the target sub-blocks to which the N target units belong are cached on the chip to obtain the corresponding data blocks.

[0027] The location of the filter boundary of the target decoding macroblock and the location of the adjacent input pixel data on both sides of the filter boundary in the data block are determined in order to read the reconstructed pixel data or write the filtered pixel data.

[0028] In one embodiment, determining the position of the filter boundary of the target decoding macroblock and the positions of the adjacent input pixel data on both sides of the filter boundary within the data block includes:

[0029] For vertical boundary filtering, determine the parity of the number of 1s in the binary number of the Id value of the 8×4 block in the target decoding macroblock; based on the parity determination result, determine the address of the input pixel data required for vertical boundary filtering in the data block;

[0030] For horizontal boundary filtering, determine the parity of the number of 1s in the binary number of the Id value of the 4×8 block in the target decoding macroblock; based on the parity determination result, determine the address of the input pixel data required for horizontal boundary filtering in the data block.

[0031] Secondly, this application also provides a loop filter device, comprising:

[0032] The target determination module is used to process N target units of a decoded macroblock each time when reading reconstructed pixel data from the on-chip cache or writing filtered pixel data into the on-chip cache; wherein each decoded macroblock can be divided into M decoded macroblocks, a decoded macroblock can be divided into N subblocks, each subblock can be divided into N target units, the N target units belong to the same subblock, and the addresses of each subblock in the on-chip cache are different;

[0033] An information acquisition module is used to acquire the associated attribute information of the target sub-blocks to which the N target units belong; the associated attribute information includes the first position information, the first size information, and the color component information of the target decoding tree block to which the target sub-block belongs, the second position information of the target decoding macroblock to which the target sub-block belongs in the target decoding tree block, and the third position information of the target sub-block in the target decoding macroblock; the first position information is the coordinate of the target decoding tree block in the width direction of the decoded image;

[0034] The address mapping module is used to perform address mapping based on the associated attribute information, obtain the target location information corresponding to the target sub-blocks to which the N target units belong in the on-chip cache, and read the reconstructed pixel data or write the filtered pixel data based on the target location information.

[0035] Thirdly, this application also provides a computer device, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the method as described in any of the preceding claims.

[0036] Fourthly, this application also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the method described in any of the preceding claims.

[0037] Fifthly, this application also provides a computer program product, including a computer program that, when executed by a processor, implements the method described in any of the preceding claims.

[0038] The aforementioned loop filtering method, apparatus, computer device, computer-readable storage medium, and computer program product establish an address mapping relationship between data blocks and on-chip caches. When reading reconstructed pixel data from the on-chip cache or writing filtered pixel data into the on-chip cache, N target units of a decoded macroblock are processed each time, enabling N SRAMs to perform parallel read or write operations within one clock cycle. For example, the loop filtering module can complete the filtering data required for a vertical or horizontal boundary within one clock cycle, thereby improving the filtering performance of the loop filtering module. Attached Figure Description

[0039] To more clearly illustrate the technical solutions in the embodiments of this application or related technologies, the drawings used in the description of the embodiments of this application or related technologies will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.

[0040] Figure 1 This is a schematic diagram of the data flow between the on-chip cache and external memory of a loop filter module in the prior art;

[0041] Figure 2 This is an overall architecture diagram of the loop filter module 4-SRAM in one embodiment;

[0042] Figure 3 This is a flowchart illustrating a loop filtering method in one embodiment;

[0043] Figure 4 This is a schematic diagram illustrating the relationship between decoding tree blocks, decoding macroblocks, and target units in one embodiment.

[0044] Figure 5 This is a schematic diagram of a 64×64 pixel CTU in one embodiment;

[0045] Figure 6 This is a schematic diagram of the decoding macroblock 4-SRAM address mapping in one embodiment;

[0046] Figure 7 This is a flowchart illustrating the process of determining the target location information of a target sub-block in the on-chip cache based on the associated attribute information of the target sub-block through address mapping in one embodiment.

[0047] Figure 8 This is a schematic diagram illustrating how address mapping is performed on a target sub-block to determine its target location information in the on-chip cache based on the associated attribute information of the target sub-block in an application instance.

[0048] Figure 9 This is a schematic diagram of filtering basic data units in one embodiment;

[0049] Figure 10 This is a schematic diagram illustrating the calculation of the SRAMn of the vertical filter boundary in a 4-SRAM group address mapping in one embodiment;

[0050] Figure 11 This is a schematic diagram illustrating the calculation of the SRAMn of the horizontal filtering boundary in the 4-SRAM group address mapping in one embodiment;

[0051] Figure 12 This is a structural block diagram of a loop filter device in one embodiment;

[0052] Figure 13 This is an internal structural diagram of a computer device in one embodiment. Detailed Implementation

[0053] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.

[0054] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this application described herein can be implemented in orders other than those illustrated or described herein. The terms "comprising" and "having," and any variations thereof, as used in this application, are intended to cover non-exclusive inclusion. The term "a plurality" as used in this application refers to two or more. The term "and / or" as used in this application refers to one of the embodiments or any combination of multiple embodiments.

[0055] Before introducing the specific embodiments of this application, the technical terms and background of loop filtering involved in this application will be explained.

[0056] CTU stands for Coding Tree Unit, which is the largest processing unit in video encoding, also known as a decoding tree block. It can be understood as the largest block into which a video frame is divided, its size (denoted as ) The size of the CTU is either 128×128 pixels or 64×64 pixels. When the size of the CTU is 128×128 pixels, its value can be set to 1, i.e. When the size of CTU is 64×64 pixels, its value can be recorded as 0, that is... .

[0057] MB: or Macroblock, is a basic filtering unit, and its size depends on the design of the filtering algorithm from different manufacturers. A CTU can be divided into multiple MBs.

[0058] For ease of explanation, this application describes the scheme using a MB size of 16×16 pixels. Within a CTU of 64×64 pixels, 16 MBs can be divided, and the values ​​of each MB... The possible values ​​are 0 to 15; within a CTU of size 128×128 pixels, 64 MBs can be divided, and the value of each MB can be... It can be 0 to 63.

[0059] The decoded macroblock (MB) is further divided into 8×8 pixel sub-blocks, and the values ​​of each sub-block are... The values ​​are: 0, 1, 2, 3.

[0060] It is understandable that the loop filter module has a large amount of data read and write interaction with external memory (referred to as external memory), making it a bottleneck module for the back-end performance and bandwidth of the video decoder. In order to save external memory access bandwidth, the design of the on-chip cache (SRAM) of the loop filter module has become a research focus.

[0061] refer to Figure 1 This is a schematic diagram illustrating the data flow between the on-chip cache and external memory of a loop filtering module in existing technology. For example... Figure 1 As shown, the loop filtering module comprises four sub-modules: a reconstruction module, a prefetch module, a filtering execution module, and an external memory read / write arbitration module. These four sub-modules execute in parallel. The functions of each module are described below:

[0062] Reconstruction module: Used to write the reconstructed pixel data into the on-chip cache (SRAM) to provide filtering data for the filtering execution module.

[0063] Prefetch module: Used to write the prefetched upper / left neighbor data into the on-chip cache (SRAM) to provide filtering data for the filtering execution module.

[0064] Filtering execution module: Used to read filtered data from on-chip cache (SRAM), perform filtering operations, and write the filtered data back to on-chip cache (SRAM).

[0065] External memory read / write arbitration module: used to read filtered data from on-chip cache (SRAM) and write it back to external memory.

[0066] The read / write performance of the on-chip cache (SRAM) in the loop filter module directly affects the execution performance of the loop filter. The main factor affecting the read / write performance of the on-chip cache (SRAM) is the parallel read / write capability of the on-chip cache (SRAM) within one clock cycle. Different designs of the on-chip cache (SRAM) in the loop filter module have different impacts on hardware resource consumption and the filtering performance of the hardware decoder. The shortcomings of existing designs of the on-chip cache (SRAM) in the loop filter module of the hardware decoder mainly include the following aspects:

[0067] (1) On-chip cache (SRAM) only supports single read and write, and the filtered data at the loop filter boundary needs to be read and written in multiple clock cycles.

[0068] (2) The data unit size of the on-chip cache (SRAM) is not flexible enough and cannot simultaneously support the reading and writing of loop filtering vertical and horizontal boundary filtering data within one clock cycle.

[0069] (3) The on-chip cache (SRAM) and the memory access path interface read / write unit size are not compatible, and the data read / write interaction with the external memory cannot be completed within one clock cycle.

[0070] To address the aforementioned issues, this application proposes a 4-SRAM implementation scheme for the loop filtering module of a decoding standard based on Coded Units (CTUs), such as... Figure 2 The diagram shows the overall architecture of the 4-SRAM loop filter module. An address mapping module is designed between the filter execution module and the on-chip cache (SRAM) in the loop filter module, so that the four SRAMs can perform parallel read or write operations within one clock cycle. That is, the loop filter module can complete the filtering data required for a vertical or horizontal boundary within one clock cycle, thereby improving the filtering performance of the loop filter module.

[0071] Here, 4-SRAM refers to the parallel read / write address mapping of data blocks for four target unit pixels. Which four target unit pixel data blocks are read or written each time is determined by the position of the filter boundary in the decoded macroblock.

[0072] It should be noted that the 4-SRAM in this application is designed as 4-way parallel based on the current algorithm and hardware resources. If different algorithms or hardware resources are used, it may not necessarily be 4-way parallel, but could be N-way parallel, thus setting an N-SRAM implementation scheme. The value of N is determined according to the size and format of the input data required by the specific algorithm, as well as the hardware resources (gate count, area, etc.). The 4-way parallel 4-SRAM scheme in this application is only illustrative and should not be used as a specific limitation on this scheme.

[0073] This solution can be applied to scenarios involving video encoding and decoding formats such as HEVC, AVS2, VP9, ​​AV1, AVS3, and VVC.

[0074] The technical solution of this application and how it solves the above-mentioned technical problems will be described in detail below with specific embodiments. These specific embodiments can be combined with each other, and the same or similar concepts or processes may not be described again in some embodiments. The embodiments of this application will be described below with reference to the accompanying drawings.

[0075] In one exemplary embodiment, such as Figure 3 As shown, a loop filtering method is provided. In this embodiment, the method includes the following steps:

[0076] Step S310: When reading reconstructed pixel data from the on-chip cache or writing filtered pixel data to the on-chip cache, N target units of a decoded macroblock are processed each time; wherein, each decoded tree block can be divided into M decoded macroblocks, a decoded macroblock can be divided into N sub-blocks, each sub-block can be divided into N target units, the N target units belong to the same sub-block, and the addresses of each sub-block in the on-chip cache are different.

[0077] It is understandable that a decoded frame (decoded image) can be divided into multiple decoded tree blocks (CTUs) of the same size, each decoded tree block (CTU) can be divided into multiple decoded macroblocks (MBs) of the same size, each decoded macroblock (MB) can be divided into multiple subblocks of the same size, and each subblock can be further divided into multiple target units of the same size.

[0078] For example, if the size of the decoding tree block (CTU) is 128×128 pixels and the size of the decoding macroblock (MB) is 16×16 pixels, then one CTU can be divided into 64 MBs; if the size of the sub-block is 8×8 pixels and the size of the target unit is 4×4 pixels, then each MB can be divided into 4 sub-blocks and 16 target units; then M=64 and N=4.

[0079] For example, if the size of a decode tree unit (CTU) is 64×64 pixels and the size of a decode macroblock (MB) is 16×16 pixels, then one CTU can be divided into 16 MBs. Similarly, if the size of a subblock is 8×8 pixels and the size of a target unit is 4×4 pixels, then each MB can be divided into 4 subblocks and 16 target units; therefore, M=16 and N=4.

[0080] refer to Figure 4 This is a schematic diagram illustrating the relationship between a decoding tree block, a decoding macroblock, and a target unit in one embodiment. Figure 4 The whole can be viewed as a decoding tree unit (CTU), divided into 16 decoding macroblocks (MBs), and the values ​​of each decoding macroblock (MB) are... The values ​​are 0 to 15. Each decoded macroblock (MB) is divided into 4 sub-blocks with a size of 8×8 pixels, and each sub-block is further divided into 4 target units (s0, s1, s2, s3 in the figure).

[0081] It should be noted that the four sub-blocks obtained from the decoding macroblock (MB) can be four 1×4 sub-blocks, four 4×1 sub-blocks, or four 2×2 sub-blocks (e.g., ...). Figure 6 As shown in the 4-SRAM group in the present application, the following embodiments are described using a 2×2 sub-block as an example. If it is necessary to change to a 1×4 or 4×1 sub-block, those skilled in the art can make corresponding adjustments. For example, the specific mapping formulas (formulas (1) to (4) below) can be adapted.

[0082] Step S320: Obtain the associated attribute information of the target sub-blocks to which the N target units belong; the associated attribute information includes the first position information, the first size information and the color component information of the target decoding tree block to which the target sub-block belongs, the second position information of the target decoding macroblock to which the target sub-block belongs in the target decoding tree block, and the third position information of the target sub-block in the target decoding macroblock; the first position information is the coordinate of the target decoding tree block in the width direction of the decoded image.

[0083] It is understandable that each decoded image consists of multiple decoded tree blocks, such as Figure 4 As shown, each decoding tree block includes M decoding macroblocks, each decoding macroblock includes N subblocks, and each subblock contains N target units. That is, there is a nested hierarchical relationship of inclusion and being included between the decoded image, decoding tree block, decoding macroblock, subblock, and target unit.

[0084] Furthermore, the on-chip cache (SRAM) is designed to cache one ctux128 data (i.e., 128×128 pixels of data). However, if the size of the decoded tree block is 64×64 pixels, then two ctux64s can be cached in the x-direction of the image, i.e., the image width. This means the on-chip cache (SRAM) mapping only needs to consider the coordinates of ctux, without needing to consider the coordinates in the y-direction. Therefore, the first location information is the coordinate of the target decoded tree block in the width direction of the decoded image, denoted as ctux.

[0085] The first size information is denoted as ctu128, and its value is either 0 or 1. 0 indicates that the size of the target decoding tree block is 64×64 pixels, and 1 indicates that the size of the target decoding tree block is 128×128 pixels.

[0086] The second location information is denoted as When the CTU size is 64×64 pixels, The value ranges from 0 to 15; when the size of the CTU is 128×128 pixels, The value ranges from 0 to 63.

[0087] Among them, color component information (denoted as The value can be 0, 1, or 2. When the color component information is the luminance component (Luma), When it is the chromaticity component (cb), When it is the chromaticity component (cr), .

[0088] The target sub-block's size is 8×8 pixels, and the third position information is denoted as... The value can be 0, 1, 2, or 3.

[0089] refer to Figure 5 This is a schematic diagram of a 64×64 pixel CTU shown in one embodiment, illustrating the second location information ( When the value of ) is 7, the corresponding third position information The value of .

[0090] The 4-SRAM design of this application is intended to read and write data of pixel blocks of 4 target units in a single operation, including the third position information. The four target units can be characterized as the location of the target sub-blocks within the decoded macroblock.

[0091] Step S330: Based on the associated attribute information, perform address mapping to obtain the target location information corresponding to the target sub-blocks to which the N target units belong in the on-chip cache, and read the reconstructed pixel data or write the filtered pixel data based on the target location information.

[0092] It is understandable that within a single decoded macroblock, the mapping addresses of each target unit within its sub-blocks remain unchanged. For example, ... Figure 4 As shown, the address order of each target unit in the sub-block located in the upper left corner of each decoded macroblock is always s0, s1, s2, s3; the address order of each target unit in the sub-block located in the lower left corner is always s2, s3, s1, s0; therefore, determining the position of the sub-block is equivalent to determining the position of each target unit in the sub-block.

[0093] Therefore, the 4-SRAM address mapping mainly involves obtaining the addresses of the target sub-blocks (sub_blocks) belonging to N target units in the SRAM. The on-chip cache (SRAM) can cache data from several decoded macroblocks. The 4-SRAM addresses are calculated based on the internal design of the on-chip cache (SRAM). The process is: address of the on-chip cache (SRAM) corresponding to the decoded tree block (CTU) → address of the on-chip cache (SRAM) corresponding to the decoded macroblock (MB) → address of the sub-block within its respective decoded macroblock (MB). Through address mapping, the target location information corresponding to the target sub-blocks belonging to each target unit in the on-chip cache is obtained. Then, based on the target location information, reconstructed pixel data is read or filtered pixel data is written.

[0094] In the above loop filtering method, by establishing an address mapping relationship between data blocks and on-chip cache, when reading reconstructed pixel data from the on-chip cache or writing filtered pixel data into the on-chip cache, N target units of a decoded macroblock are processed each time, so that N SRAMs can perform parallel read or write operations within one clock cycle. For example, the loop filtering module can complete the filtering data required for a vertical or horizontal boundary within one clock cycle, thereby improving the filtering performance of the loop filtering module.

[0095] In an exemplary embodiment, the number N of target units processed each time is 4; the address of the target unit divided from each decoded macroblock in the on-chip cache satisfies the following read / write request conditions:

[0096] The N target units on the horizontal side of the decoded macroblock have different mapping addresses in their respective subblocks, so that the filtered data at the vertical filtering boundary can be read and written within one clock cycle.

[0097] The N target units in the vertical direction of the decoded macroblock have different mapping addresses in their respective subblocks, so that the filtered data at the horizontal boundary can be read and written within one clock cycle.

[0098] The target cells of 2×2 have different mapped addresses in their respective sub-blocks so that they can be read and written within one clock cycle;

[0099] Furthermore, the mapping addresses of each target unit in the N sub-blocks within the rectangle of the decoded macroblock size obtained by diagonally moving 2×2 target units are the same as the mapping addresses of the target units in the N sub-blocks of the decoded macroblock at the starting position.

[0100] The 4-SRAM address mapping module of this application mainly completes the address mapping of the target cell 4-SRAM, which needs to meet four conditions. For example Figure 6 As shown, condition one: In the horizontal direction (X-dir) of the decoded macroblock, the vertical filtering boundary filtering data can be read and written within one clock cycle, meaning the addresses of SRAMn (s#n) in the 4-SRAM are different in the X-dir direction, such as... Figure 6 As shown, in each row of four units, the addresses are different, belonging to one of s0, s1, s2, s3 respectively. The first row is: s0, s1, s2, s3; the second row is: s3, s2, s1, s0, ..., and there are no duplicate addresses in each row.

[0101] Similarly, condition two applies: In the vertical direction (Y-dir) of the decoded macroblock, the horizontal boundary filtering data can be read and written within one clock cycle; that is, the addresses of SRAMn (s#n) in the 4-SRAM are different in the Y-dir direction, such as... Figure 6 As shown, the addresses of the four sub-blocks in each column are also different, and they are one of s0, s1, s2, s3 respectively. The first column is: s0, s3, s2, s1; the second column is: s1, s2, s3, s0, ..., and there are no duplicate addresses in each column.

[0102] Condition 3: The 2×2 4-SRAM group (corresponding to one sub-block) completes read and write within one clock cycle, that is, the addresses of SRAMn in the 2×2 4-SRAM are different.

[0103] Condition 4: 2-SRAM alignment is satisfied, and the internal mapped address of the accessed 2x2 4-SRAM is fixed. That is, moving two 4x4 blocks simultaneously to the right and down; or moving two 4x4 blocks simultaneously to the left and up, will result in a fixed data block of sram#n (abbreviated as s#n), such as... Figure 4 As shown in the green box, its mapped address (s#n) is the same as that of each target cell in the red box. s#n represents the address of SRAMn in the 4-SRAM.

[0104] It should be noted that the address mapping of 4-SRAM is not limited to... Figure 4 The address mapping relationships shown are all acceptable as long as they meet the above four conditions.

[0105] In this embodiment, by using different addresses, i.e., different n in sram#n, one-time read and write can be achieved, so that the filtering data required by the loop filtering module for a vertical or horizontal boundary can be completed within one clock cycle, thereby improving the filtering performance of the loop filtering module.

[0106] In one exemplary embodiment, such as Figure 7 As shown, address mapping is performed based on the associated attribute information to obtain the target location information corresponding to the target sub-blocks to which N target units belong in the on-chip cache, including:

[0107] Step S710: Based on the color component information, determine the fourth position information of the target decoding macroblock in the on-chip cache and the fifth position information of each sub-block in the target decoding macroblock in the on-chip cache.

[0108] Among them, the fourth position information of the target decoding macroblock is determined based on the color component information, including: when the color component information is the luminance component (i.e., When the fourth position information is the same as the second position information, the second position information can be directly used as the fourth position information; when the color component information is the chromaticity component (i.e., When the second location information is processed, address remapping is performed to obtain the fourth location information.

[0109] Specifically, the process of determining the fourth position information of the target decoded macroblock from the color component information can be expressed by the following formula:

[0110] (1)

[0111] in, This indicates the fourth position information of the target decoded macroblock; This represents the second position information of the target decoded macroblock, that is, the position of the target decoded macroblock in the target decoded tree block, or its value; Represents color component information, is represented as the luminance component, is represented as the chrominance component, such as Cb or Cr; A << B means shifting the binary of A to the left by B bits and padding 0s on the right; A >> B means shifting the binary of A to the right by B bits; A & 0x1 is a bitwise operation that retains the lowest 1 bit of A and clears all higher bits; A & 0xf is a bitwise operation that retains the lower 4 bits of A and clears the higher bits.

[0112] Among them, according to the color component information, determining the fifth position information of each sub-block in the target decoded macroblock in the on-chip cache includes: when the color component information is the luminance component, the fifth position information is the same as the third position information; when the color component information is the chrominance component Cb, the first address mapping relationship is used to determine the fifth position information; when the color component information is the chrominance component Cr, the second address mapping relationship is used to determine the fifth position information.

[0113] Specifically, the process of determining the fifth position information of each sub-block by the color component information can be expressed by the formula:

[0114] (2)

[0115] Among them, represents the fifth position information of each sub-block; represents the third position information of each sub-block.

[0116] It can be understood that in the on-chip cache (SRAM), first store the luminance component (Luma) data of 2 decoded tree blocks (CTU), and then store the corresponding chrominance component (Cb) data and the corresponding chrominance component (Cr) data. Among them, the chrominance components Cb and Cr corresponding to the luminance component of the same decoded macroblock are stored continuously in the SRAM. Among them, the chrominance component Cb only stores even bits, and the chrominance component Cr only stores odd bits. Based on this, the mapping relationship between the fifth position information and the second position information in the above formula is set for (representing Cb of the chrominance block) and (representing Cr of the chrominance block).

[0117] Step S720, obtaining the address of the target decoded macroblock in the on-chip cache according to the first position information, the first size information, the color component information, and the fourth position information of the target decoded tree block.

[0118] Specifically, the process of determining the address of the target decoded macroblock in the on-chip cache can be expressed by the formula:

[0119] (3)

[0120] Among them, represents the address of the target decoded macroblock in the on-chip cache, This indicates the first size information of the target decoded tree block. This indicates the first position information of the target decoding tree block. Represents color component information, This indicates the fourth position information of the target decoded macroblock.

[0121] Step S730: Based on the address of the target decoding macroblock in the on-chip cache and the fifth position information of each sub-block, obtain the target position information corresponding to the target sub-block to which the N target units belong in the on-chip cache.

[0122] Specifically, firstly, based on the fifth position information of each sub-block, the internal address of each sub-block within its corresponding decoded macroblock is determined. This process can be expressed by the formula:

[0123] (4)

[0124] in, This indicates the address of the sub-block within its corresponding decoded macroblock, specifically the SRAM address of the sub-block containing 4 SRAM cells. This indicates the fifth position information of each sub-block.

[0125] Furthermore, based on the on-chip cache address of the target decoding macroblock and the on-chip cache address of each subblock, the target location information corresponding to the target subblocks to which the N target units belong in the on-chip cache can be obtained, which can be expressed by the formula:

[0126] (5)

[0127] refer to Figure 8 This is a schematic diagram illustrating how address mapping is performed on a target sub-block based on its associated attribute information to obtain its target location information in the on-chip cache, as shown in an application instance. It is 7. 0 (brightness block) It is 1, calculated according to the formulas (1) to (5) above: It is 28. =1, The value is 29. The data blocks of the 4-SRAM retrieved by the index are: s2, s3, s1, and s0.

[0128] This embodiment calculates addresses layer by layer based on the associated attribute information of the target units to obtain the target location information corresponding to each target unit in the on-chip cache, thereby realizing the address association between the target units and the on-chip cache, and thus enabling accurate data reading or writing.

[0129] In an exemplary embodiment, reading reconstructed pixel data or writing filtered pixel data based on target location information includes: determining the corresponding data blocks in the on-chip cache of the target sub-blocks to which N target units belong based on the target location information; determining the position of the filtering boundary of the target decoding macroblock and the position of the adjacent input pixel data on both sides of the filtering boundary in the data block, so as to read reconstructed pixel data or write filtered pixel data.

[0130] It can be understood that the mapping of the filter boundary data in the 4-SRAM is determined by the position of the filter data in the 4-SRAM (SRAMn) based on the position of the filter boundary in the decoding block. For example... Figure 9 As shown, the filtering of the basic data unit includes filtering of the vertical boundary and the horizontal boundary. The filtering of the vertical boundary requires 4×1 SRAM data, while the filtering of the horizontal boundary requires 1×4 SRAM data.

[0131] First, the data block corresponding to the 4-SRAM is read according to the mapped address. Then, the position of the required input data in the data block is determined according to the filtering boundary. That is, according to the position of the filtering boundary of the decoded macroblock (MB), the position of the input pixel data on both sides of the filtering boundary in the sram#n of the 4-SRAM data read according to the mapped address in the aforementioned embodiment is determined.

[0132] Further, determining the location of the filtering boundary of the target decoding macroblock and the location of the adjacent input pixel data on both sides of the filtering boundary in the data block includes: for filtering of the vertical boundary, determining the parity of the number of 1s in the binary number of the Id value of the 8×4 block in the target decoding macroblock; based on the parity determination result, determining the address of the input pixel data required for filtering of the vertical boundary in the data block; for filtering of the horizontal boundary, determining the parity of the number of 1s in the binary number of the Id value of the 4×8 block in the target decoding macroblock; based on the parity determination result, determining the address of the input pixel data required for filtering of the horizontal boundary in the data block.

[0133] For example, such as Figure 9 As shown, the required filtered data for the vertical boundary are s3, s2, s1, and s0, while the filtered data for the horizontal boundary are s1, s2, s3, and s0. The mapping of the filtered boundary data in the 4-SRAM determines the position of s# in the 4-SRAM.

[0134] Specifically, the address calculation of SRAMn in 4-SRAM is explained as follows:

[0135] For mapping of vertical boundary filtered data: such as Figure 10 As shown, The ID value of an 8×4 block within a decoded macroblock (MB) ranges from 0 to 7. The mapping relationship between the parity of the number of 1s in the binary representation of the ID value of the 8×4 block in the target decoded macroblock and the address can be expressed by the formula:

[0136] (6)

[0137] according to The parity of the number of 1s in the binary number determines whether it is a combination of s0s1 or s2s3. Specifically, for SRAM_Id: 0, 3, 5, 6, the number of 1s in the binary number is even, which determines it to be a combination of s0s1; for SRAM_Id: 1, 2, 4, 7, the number of 1s in the binary number is odd, which determines it to be a combination of s2s3.

[0138] For determining the horizontal boundary filtering data: (e.g.) Figure 11 As shown, The ID value of a 4×8 block in a decoded macroblock (MB) ranges from 0 to 7. The mapping relationship between the parity of the number of 1s in the binary representation of the ID value of the 4×8 block in the target decoded macroblock and the address can be expressed by the formula:

[0139] (7)

[0140] according to The parity of the number of 1s in the binary number determines whether it is a combination of s0s3 or s1s2. Specifically, for SRAM_Id: 0, 3, 5, 6, the number of 1s in the binary number is even, which determines it is a combination of s0s3; for SRAM_Id: 1, 2, 4, 7, the number of 1s in the binary number is odd, which determines it is a combination of s1s2.

[0141] It should be noted that the example in this embodiment is for illustrative purposes only, and the calculation of the SRAMn address is not limited to this calculation.

[0142] The proposed hardware decoder loop filter module 4-SRAM implementation scheme has the following advantages:

[0143] (1) Hardware-friendly: The data unit size of 4-SRAM is fixed and the address mapping rules are also fixed. This implementation method simplifies the hardware implementation logic.

[0144] (2) Improved filtering performance: The hardware implementation scheme of 4-SRAM proposed in this solution can complete the reading or writing of the filtering data required for the filtering boundary within one clock cycle.

[0145] (3) 4-SRAM implementation scheme is compatible with memory access data path interface bit width: The 4-SRAM implementation scheme can complete the reading and writing of 512 bits of data in one clock cycle. When the data bit width of the memory access data path interface is less than or equal to 512 bits, the data preparation can be completed in one clock cycle.

[0146] (4) The 4-SRAM implementation scheme is universal in video decoding standards: The 4-SRAM implementation scheme proposed in this scheme is applicable to all video standards based on coding tree unit (CTU), such as the loop filtering modules of video standards such as HEVC, AVS2, VP9, ​​AV1, AVS3, and VVC.

[0147] It should be understood that although the steps in the flowcharts of the embodiments described above are shown sequentially according to the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowcharts of the embodiments described above may include multiple steps or multiple stages. These steps or stages are not necessarily completed at the same time, but can be executed at different times. The execution order of these steps or stages is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the steps or stages of other steps.

[0148] Based on the same inventive concept, this application also provides a loop filtering device for implementing the loop filtering method described above. The solution provided by this device is similar to the solution described in the above method; therefore, the specific limitations in one or more loop filtering device embodiments provided below can be found in the limitations of the loop filtering method described above, and will not be repeated here.

[0149] In one exemplary embodiment, such as Figure 12 As shown, a loop filter device is provided, comprising:

[0150] The target determination module 1201 is used to process N target units of a decoded macroblock each time when reading reconstructed pixel data from the on-chip cache or writing filtered pixel data to the on-chip cache; wherein each decoded macroblock can be divided into M decoded macroblocks, a decoded macroblock can be divided into N subblocks, each subblock can be divided into N target units, the N target units belong to the same subblock, and the addresses of each subblock in the on-chip cache are different;

[0151] The information acquisition module 1202 is used to acquire the associated attribute information of the target sub-blocks to which N target units belong; the associated attribute information includes the first position information, first size information and color component information of the target decoding tree block to which the target sub-block belongs, the second position information of the target decoding macroblock to which the target sub-block belongs in the target decoding tree block, and the third position information of the target sub-block in the target decoding macroblock; the first position information is the coordinate of the target decoding tree block in the width direction of the decoded image;

[0152] The address mapping module 1203 is used to perform address mapping based on the associated attribute information, obtain the target location information corresponding to the target sub-blocks to which N target units belong in the on-chip cache, and read the reconstructed pixel data or write the filtered pixel data based on the target location information.

[0153] In one embodiment, the number N of target units processed each time is 4; the address of the target unit divided from each decoded macroblock in the on-chip cache satisfies the following read / write request conditions:

[0154] The N target units on the horizontal axis of the decoded macroblock have different mapping addresses in their respective subblocks, so that the filtered data at the vertical filtering boundary can be read and written within one clock cycle.

[0155] The N target units in the vertical direction of the decoded macroblock have different mapping addresses in their respective subblocks, so that the filtered data at the horizontal boundary can be read and written within one clock cycle.

[0156] The target cells of 2×2 have different mapped addresses in their respective sub-blocks so that they can be read and written within one clock cycle;

[0157] Furthermore, the mapping addresses of each target unit in the N sub-blocks within the rectangle of the decoded macroblock size obtained by diagonally moving 2×2 target units are the same as the mapping addresses of each target unit in the N sub-blocks of the decoded macroblock at the starting position.

[0158] In one embodiment, the address mapping module 1203 is further configured to determine the fourth location information of the target decoding macroblock in the on-chip cache and the fifth location information of each sub-block in the target decoding macroblock in the on-chip cache based on the color component information; obtain the address of the target decoding macroblock in the on-chip cache based on the first location information, first size information, color component information and fourth location information of the target decoding tree block; and obtain the target location information corresponding to the target sub-blocks to which N target units belong in the on-chip cache based on the address of the target decoding macroblock in the on-chip cache and the fifth location information of each sub-block.

[0159] In one embodiment, the address mapping module 1203 is further configured to, when the color component information is a luminance component, have the fourth position information be the same as the second position information; and when the color component information is a chrominance component, perform address remapping processing on the second position information to obtain the fourth position information.

[0160] In one embodiment, the address mapping module 1203 is further configured to: when the color component information is a luminance component, the fifth position information is the same as the third position information; when the color component information is a chrominance component Cb, the fifth position information is determined using a first address mapping relationship; and when the color component information is a chrominance component Cr, the fifth position information is determined using a second address mapping relationship.

[0161] In one embodiment, the address mapping module 1203 is further configured to determine the corresponding data blocks in the on-chip cache of the target sub-blocks to which the N target units belong based on the target location information; determine the position of the filtering boundary of the target decoding macroblock and the position of the adjacent input pixel data on both sides of the filtering boundary in the data block, so as to read the reconstructed pixel data or write the filtered pixel data.

[0162] In one embodiment, the address mapping module 1203 is further configured to, for filtering of vertical boundaries, determine the parity of the number of 1s in the binary number of the Id value of the 8×4 block in the target decoding macroblock; and, based on the parity determination result, determine the address of the input pixel data required for filtering of vertical boundaries in the data block; for filtering of horizontal boundaries, determine the parity of the number of 1s in the binary number of the Id value of the 4×8 block in the target decoding macroblock; and, based on the parity determination result, determine the address of the input pixel data required for filtering of horizontal boundaries in the data block.

[0163] Each module in the aforementioned loop filter device can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in the processor of a computer device in hardware form or independent of it, or stored in the memory of a computer device in software form, so that the processor can call and execute the operations corresponding to each module.

[0164] In one exemplary embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as follows: Figure 13As shown, this computer device includes a processor, memory, input / output interfaces (I / O), and a communication interface. The processor, memory, and I / O interfaces are connected via a system bus, and the communication interface is also connected to the system bus via the I / O interfaces. The processor provides computational and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system, computer programs, and a database. The internal memory provides the environment for the operation of the operating system and computer programs stored in the non-volatile storage media. The database stores data related to loop filtering. The I / O interfaces are used for exchanging information between the processor and external devices. The communication interface is used for communication with external terminals via a network connection. When executed by the processor, the computer program implements a loop filtering method.

[0165] Those skilled in the art will understand that Figure 13 The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer device to which the present application is applied. Specific computer devices may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.

[0166] In one embodiment, a computer device is also provided, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps in the above method embodiments.

[0167] In one embodiment, a computer-readable storage medium is provided having a computer program stored thereon that, when executed by a processor, implements the steps in the above method embodiments.

[0168] In one embodiment, a computer program product is provided, including a computer program that, when executed by a processor, implements the steps in the above method embodiments.

[0169] It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties, and the collection, use and processing of the relevant data must comply with relevant regulations.

[0170] Those skilled in the art will understand that all or part of the processes in the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. Any references to memory, databases, or other media used in the embodiments provided in this application can include at least one of non-volatile memory and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive random access memory (ReRAM), magnetic random access memory (MRAM), ferroelectric random access memory (FRAM), phase change memory (PCM), graphene memory, etc. Volatile memory can include random access memory (RAM) or external cache memory, etc. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM). The databases involved in the embodiments provided in this application may include at least one type of relational database and non-relational database. Non-relational databases may include, but are not limited to, blockchain-based distributed databases. The processors involved in the embodiments provided in this application may be general-purpose processors, central processing units, graphics processing units, digital signal processors, programmable logic devices, quantum computing-based data processing logic devices, artificial intelligence (AI) processors, etc., and are not limited to these.

[0171] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this application.

[0172] The embodiments described above are merely illustrative of several implementation methods of this application, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of this patent application. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this application should be determined by the appended claims.

Claims

1. A loop filtering method, characterized in that, The method includes: When reading reconstructed pixel data from the on-chip cache or writing filtered pixel data to the on-chip cache, N target units of a decoded macroblock are processed each time; wherein, each decoded tree block can be divided into M decoded macroblocks, a decoded macroblock can be divided into N sub-blocks, each sub-block can be divided into N target units, the N target units belong to the same sub-block, and each sub-block has a different address in the on-chip cache; Obtain the associated attribute information of the target sub-blocks to which the N target units belong; the associated attribute information includes the first position information, the first size information, and the color component information of the target decoding tree block to which the target sub-block belongs, the second position information of the target decoding macroblock to which the target sub-block belongs in the target decoding tree block, and the third position information of the target sub-block in the target decoding macroblock; the first position information is the coordinate of the target decoding tree block in the width direction of the decoded image; Address mapping is performed based on the associated attribute information to obtain the target location information corresponding to the target sub-blocks to which the N target units belong in the on-chip cache. The reconstructed pixel data is read or the filtered pixel data is written based on the target location information.

2. The method according to claim 1, characterized in that, The number of target units N processed each time is 4; the addresses of the target units divided from each decoded macroblock in the on-chip cache satisfy the following read / write request conditions: The N target units on the horizontal side of the decoded macroblock have different mapping addresses in their respective subblocks, so that the filtered data at the vertical filtering boundary can be read and written within one clock cycle. The N target units in the vertical direction of the decoded macroblock have different mapping addresses in their respective subblocks, so that the filtered data at the horizontal boundary can be read and written within one clock cycle. The target cells of 2×2 have different mapped addresses in their respective sub-blocks so that they can be read and written within one clock cycle; Furthermore, the mapping addresses of each target unit in the N sub-blocks within the rectangle of the decoded macroblock size obtained by diagonally moving 2×2 target units are the same as the mapping addresses of the target units in the sub-blocks of the decoded macroblock at the starting position.

3. The method according to claim 1, characterized in that, The step of performing address mapping based on the associated attribute information to obtain the target location information corresponding to the target sub-blocks to which the N target units belong in the on-chip cache includes: Based on the color component information, determine the fourth position information of the target decoding macroblock in the on-chip cache and the fifth position information of each sub-block in the target decoding macroblock in the on-chip cache; Based on the first position information, the first size information, the color component information, and the fourth position information of the target decoding tree block, the address of the target decoding macroblock cached on the chip is obtained; Based on the address of the target decoding macroblock cached on the chip and the fifth position information of each sub-block, the target position information corresponding to the target sub-blocks to which the N target units belong are obtained in the on-chip cache.

4. The method according to claim 3, characterized in that, Determining the fourth position information of the target decoding macroblock based on the color component information includes: When the color component information is a luminance component, the fourth position information is the same as the second position information; When the color component information is a chromaticity component, the second location information is remapped to obtain the fourth location information.

5. The method according to claim 3, characterized in that, Based on the color component information, determine the fifth position information of each sub-block in the target decoding macroblock in the on-chip cache, including: When the color component information is a luminance component, the fifth position information is the same as the third position information; When the color component information is the chromaticity component Cb, the fifth position information is determined using the first address mapping relationship; When the color component information is the chromaticity component Cr, the fifth location information is determined using the second address mapping relationship.

6. The method according to claim 1, characterized in that, The step of reading the reconstructed pixel data or writing the filtered pixel data based on the target location information includes: Based on the target location information, the target sub-blocks to which the N target units belong are cached on the chip to obtain the corresponding data blocks. The location of the filter boundary of the target decoding macroblock and the location of the adjacent input pixel data on both sides of the filter boundary in the data block are determined in order to read the reconstructed pixel data or write the filtered pixel data.

7. The method according to claim 6, characterized in that, Determining the position of the filter boundary of the target decoding macroblock and the positions of the adjacent input pixel data on both sides of the filter boundary within the data block includes: For vertical boundary filtering, determine the parity of the number of 1s in the binary number of the Id value of the 8×4 block in the target decoding macroblock; based on the parity determination result, determine the address of the input pixel data required for vertical boundary filtering in the data block; For horizontal boundary filtering, determine the parity of the number of 1s in the binary number of the Id value of the 4×8 block in the target decoding macroblock; based on the parity determination result, determine the address of the input pixel data required for horizontal boundary filtering in the data block.

8. A loop filter device, characterized in that, The device includes: The target determination module is used to process N target units of a decoded macroblock each time when reading reconstructed pixel data from the on-chip cache or writing filtered pixel data into the on-chip cache; wherein each decoded macroblock can be divided into M decoded macroblocks, a decoded macroblock can be divided into N subblocks, each subblock can be divided into N target units, the N target units belong to the same subblock, and the addresses of each subblock in the on-chip cache are different; An information acquisition module is used to acquire the associated attribute information of the target sub-blocks to which the N target units belong; the associated attribute information includes the first position information, the first size information, and the color component information of the target decoding tree block to which the target sub-block belongs, the second position information of the target decoding macroblock to which the target sub-block belongs in the target decoding tree block, and the third position information of the target sub-block in the target decoding macroblock; the first position information is the coordinate of the target decoding tree block in the width direction of the decoded image; The address mapping module is used to perform address mapping based on the associated attribute information, obtain the target location information corresponding to the target sub-blocks to which the N target units belong in the on-chip cache, and read the reconstructed pixel data or write the filtered pixel data based on the target location information.

9. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the steps of the method according to any one of claims 1 to 7.

10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 7.

11. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 7.