A method for chip storage applicable motion estimation
A technology of motion estimation and on-chip storage, which is applied in the field of video coding and decoding, and can solve problems such as occupation
Active Publication Date: 2008-01-09
ZHANGJIAGANG KANGDE XIN OPTRONICS MATERIAL
0 Cites 15 Cited by
AI-Extracted Technical Summary
Problems solved by technology
Many technologies related to memory bandwidth only consider whole-pixel motion estimation, and sub-pixel motion estimation and motion compensation are handled separately, which requir...
 Integer-pixel motion estimation, sub-pixel motion estimation, and motion compensation share the ring columnar intra-chip space, see the gray module in the module setting diagram of FIG. 7 . Thr...
The method comprises: using an intra-slice space with column shape to realize the prefetch of data, reuse of data and space, motion estimation and motion compensation. The perimeter and height of the column both use pixel as the unit; the perimeter is EXT+2*SRX+16+ 2*SRX+EXT+H*16, and the height is EXT+2*SRY+16+2*SRY+EXT+(V-1)*16; wherein, SRX is the searching range in X direction, and the SRY is the searching range in y direction; EXT is up, down, left and right integer pixel number around the optimal integer pixel point required by 1/4 pixel interpolation.
Television systemsDigital video signal modification
Motion estimationPower consumption +2
- Experimental program(1)
 Hereinafter, preferred embodiments of the present invention are given and described in detail based on Figs. 1 to 7, so as to better understand the functions and characteristics of the present invention.
 The present invention designs a ring-cylinder-shaped intra-chip space to realize data prefetching, data reuse, space reuse, motion estimation of whole pixels, motion estimation of sub-pixels, and motion compensation, thereby accelerating speed, reducing bandwidth, The purpose of saving storage.
 The space in the ring columnar sheet is like this: the perimeter of the ring column=EXT+2*SRX+16+2*SRX+EXT+H*16, the height of the ring column=EXT+2*SRY+16+2*SRY+ EXT+(V-1)*16, 16 is the width and height of the macro block. SRX is the search range in the X direction, and SRY is the search range in the Y direction. EXT is the optimal number of integral pixels up, down, left, and right required for 1/4 pixel interpolation. For AVS and h.264, EXT=3. H=1, 2, 3, V=1, 2. Refer to Figure 1: The space inside the ring columnar sheet, Figure 2: The ring column expands into a rectangle.
 The variables H, V, EXT, SRX, and SRY of the circular cylindrical space can be configured accordingly for different architectures and even different standards. Regardless of the architecture, PC, DSP or VLSI, if mby is increased by 1 each time, one line of macroblocks is coded, V=1, and if mby is increased by 2 each time, two lines of macroblocks are coded at the same time, V=2. For standard definition V=1 or 2, for high definition, V=2, V=2 means to reduce the memory bandwidth more effectively. When V=1, for the PC and DSP architecture, H=1, indicating that the motion estimation of the current macroblock and the data prefetching of the next macroblock can be executed in parallel; for the pipeline architecture of VLSI, H=2, which aims at the current macroblock The pixels of the sub-pixel motion estimation and motion compensation are not covered by the pixels of the whole pixel motion estimation of the next macroblock, and the data has been prefetched during the whole pixel motion estimation. When V=2, for PC and DSP architecture, H=2, the first two macroblocks of the first row of macroblocks in two rows of macroblocks should be coded consecutively, and then the motion estimation of the macroblocks and data prefetching are performed in parallel; for VLSI Pipeline architecture, H=3, the first three macroblocks of the first row of macroblocks in two rows of macroblocks must be coded consecutively, and the data must be prefetched before motion estimation. EXT is configured according to the difference of sub-pixels. For AVS and H.264, EXT=3; for MPEG2 and MPEG4-2, EXT=1. SRX and SRY are reasonably configured according to the motion amplitude of the image and the resources of the encoder, such as SRX=SRY=16 or SRX=32, SRY=16.
 The data prefetching is as follows: Set the width and height of an image to be width and height respectively, (mbx, mby) is the coordinates of an image in macroblocks, then 0 floor((height-(16+2*SRY+EXT+ (V-1)*16))/16) Three situations; the prefetching process is completed in the following three steps:
 The first step is to prefetch the pixels of the reference frame to the corresponding position in the circular columnar space. The gray part in the figure is a horizontal direction of 16+2*SRX+EXT pixels and a vertical direction of EXT+2*SRY+16+ 2*SRY+EXT+(V-1)*16-(up_dy>0?up_dy:bot_dy>0?bot_dy:0) rectangle of pixels, where up_dy=EXT+2*SRY-mby*16, bot_dy=mby* 16+16+2*SRY+EXT-height. For case A, up_dy>0, the picture shows mby=0; for case B, up_dy 0, the picture shows mby = Height/16-1 and height is divisible by 16.
 The second step is to expand each pixel in the leftmost column of the prefetched gray area by EXT+2*SRX pixels in parallel to the left, which will form a horizontal direction EXT+2*SRX pixels and a vertical direction EXT+ 2*SRY+16+2*SRY+EXT+(V-1)*16-(up_dy>0?up_dy:bot_dy>0?bot_dy:0) rectangle of pixels, up_dy and bot_dy are the same as above.
 The third step is to use the prefetched top row of pixels (including left-extended pixels) to expand vertically upwards to the top of the circular columnar space, or use the prefetched bottom row of pixels (including left-extended pixels) to vertically downwards Expand to the bottom of the ring columnar space. For case A, a rectangle with EXT+2*SRX+16+2*SRX+EXT pixels in the horizontal direction and EXT+2*SRY-mby*16 pixels in the vertical direction is formed; for case B, the gray area is both above and below There is no blank area, and the requirements have been met at this time; for case C, a horizontal direction EXT+2*SRX+16+2*SRX+EXT pixels and a vertical direction mby*16+16+2*SRY+EXT-height pixels are formed The rectangle of pixels.
When mbx! =0, see Figure 4: Data prefetch mbx! =0, divided into nine situations of A1, A2, A3, B1, B2, B3, C1, C2, C3. The conditions of A, B, and C are the same as when mbx=0, and the conditions of 1, 2, and 3 are 1: 0 floor((width-(2*SRX+EXT))/16); The data prefetching process is completed by the following two steps:
 The first step is to prefetch the data from the reference frame to the corresponding position in the circular columnar space. The gray part in the figure is a horizontal direction x pixels wide, 0 0?up_dy:bot_dy>0?bot_dy:0) pixel bar with high pixels, up_dy and bot_dy are the same as above. In case 1, the 16 pixels in the horizontal direction of each row of the pixel bar are all prefetched from off-chip to on-chip; in case 2, the horizontal direction of each row of the pixel bar is only prefetched to x0, 0 0?up_dy:bot_dy>0?bot_dy will be obtained for each case : 0) Macroblock strips with high pixels.
 The second step is to use the 16 pixels of the top row of the macroblock bar obtained in the previous step to extend vertically upwards to the top of the ring column, or use the 16 pixels of the bottom row of the macroblock bar to extend and write down to the bottom of the ring column . For case A, expand upward to form a macroblock strip with 16 pixels wide in the horizontal direction and EXT+2*SRY-mby*16 pixels high in the vertical direction; for case B, no expansion is required; for case C, downward The extended writing forms a macroblock strip with a width of 16 pixels in the horizontal direction and a mby*16+16+2*SRY+EXT-height pixel height in the vertical direction. After the completion of this step, a macroblock bar of 16 pixels wide and EXT+2*SRY+16+2*SRY+EXT+(V-1)*16 pixels high (circular column height) is obtained.
 The above data prefetching process describes how to prefetch the pixels of the off-chip reference frame into the circular cylindrical intra-chip space. Regardless of whether the width and height can be divisible by 16, no processing is required, it is implicit in the specific implementation, and the entire pixel is not expanded according to the search range. Only the entire pixel is prefetched, and the sub-pixel is not stored. Yes, this not only reduces storage, but also reduces the bandwidth of prefetching data. The reduction of memory bandwidth, the reuse of data, and the reuse of space are also reflected in this process. The reduction of memory bandwidth is mainly achieved by data reuse.
 Data reuse includes two levels: one is the reuse of data prefetched from off-chip to on-chip, and the other is the reuse of data sent to the processing unit from on-chip. For the first level of data reuse, see Figure 5: Data reuse between adjacent macroblocks, A: horizontal direction, B: vertical direction. The reuse of horizontal data occurs when the left and right adjacent macro blocks are sequentially encoded. The range of a single macro block is: the height is EXT+2*SRY+16+2*SRY+EXT, and the width is EXT+2*SRX+16+ The rectangle of 2*SRX+EXT, that is, the gray part in the figure plus a 16-pixel-wide vertical macroblock strip (white part), the data of the gray part (overlapping part) does not need to be prefetched again, which greatly Reduced memory bandwidth; vertical data reuse occurs when V=2, the range of a single macro block is also EXT+2*SRY+16+2*SRY+EXT, and width EXT+2*SRX+16 +2*SRX+EXT rectangle, that is, the gray part of the figure plus a 16-pixel-high horizontal macroblock strip (white part). The data of the gray part (overlapping part) does not need to be prefetched again, which is more effective The memory bandwidth is reduced; the EXT in the space is used for sub-pixel difference. It is not processed separately, is not repeatedly prefetched, and is also reused, which further reduces the bandwidth.
 For the second level of data reuse, see Figure 6: Data reuse between adjacent pixels, gray part A: horizontal direction, B: vertical direction. This level of data reuse occurs when the data in the on-chip space is sent to the processing unit for processing. In a matching, 256 reference pixels are already in the processing unit. If the next matching pixel is the adjacent pixel in the horizontal direction, only the 16 pixels ( The slender white bar in the figure) is sent to the processing unit; if the next matching pixel is the adjacent pixel in the vertical direction, only the 16 pixels in the parallel direction (in the figure) next to the lower (or upper) side of the reference macroblock The slender white bar) is sent to the processing unit. In this way, 15×16 or 16×15 pixels (the gray part in the figure) are reused for each match, which will reduce the bandwidth of data transmission between the on-chip memory and the processing unit, and speed up the data processing.
 The saving of on-chip space is mainly achieved through space reuse. The reuse of space is like this: when mbx=H, the prefetched data fills the entire circular columnar space, and every time mbx increases by 1, the prefetched data will cover a vertical pixel bar to the right, and the reuse of space is realized in this way . Refer to Figure 1: The space within the circular columnar slice, with vertical pixel bars separated by dotted lines. Example: Since the perimeter of the ring column = EXT+2*SRX+16+2*SRX+EXT+H*16, the logical image width = EXT+2*SRX+width+2*SRX+EXT, for standard definition D1, width=720, height=576, set SRX=16, H=V=1, EXT=3, then the image width/circumference column circumference=(3+2*16+720+2*16+3)/ (3+2*16+16+2*16+3+16)=790/102=7.7, that is to say, for each pair of macroblock encodings, the circular cylindrical space is reused at least 7 times; 576/16=36, 36*7=252, that is to say, a D1 reference frame, when doing motion estimation, the circular cylindrical space is reused 252 times.
 Integer pixel motion estimation, see Figure 2: The rectangle expanded into the ring column, the search window in the figure, the origin (0, 0), the predicted point MVP (pmvx, pmvy), and the matching point MV (mvx, mvy) reflect the entire pixel The relationship between motion estimation and circular cylindrical space. The whole pixel motion estimation is based on MVP (pmvx, pmvy), -SRX
 For sub-pixel motion estimation, around the optimal whole pixel, first perform 1/2 pixel difference to find the optimal 1/2 pixel; then perform 1/4 pixel difference around the optimal 1/2 pixel to find Find the optimal 1/4 pixel point. When the optimal whole pixel point (bmvx, bmvy) and the boundary point (emvx, emvy) of the search window satisfy |bmvx-emvx|
 Motion compensation, when the optimal motion vector MV (mvx, mvy) is a whole pixel, there is no need to do the sub-pixel difference, and the motion compensation can be obtained directly from the search window; when MV (mvx, mvy) is 1/2 When pixel points, do not need to do 1/4 sub-pixel difference value, and reasonably insert 1/2 pixel value as motion compensation; when MV (mvx, mvy) is 1/4 pixel point, do 1/2 pixel difference first Value, and then do 1/4 pixel difference to interpolate the result of motion compensation. The difference of sub-pixels in motion compensation is the same as that of sub-pixel motion estimation.
 Integer pixel motion estimation, sub-pixel motion estimation, and motion compensation share the circular columnar intra-slice space. Refer to the gray module in the module setting diagram in Figure 7. Through the prefetching of data, the processing unit fetches the data in time, which speeds up the processing speed; through the reuse of data, the memory bandwidth is reduced; through the reuse of space, the on-chip storage is reduced.
 The foregoing provides a description of the preferred embodiments so that any person skilled in the art can use or utilize the present invention. For this preferred embodiment, those skilled in the art can make various modifications or changes without departing from the principle of the present invention. It should be understood that these modifications or changes do not depart from the protection scope of the present invention.
Description & Claims & Application Information
We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.