Systems and methods for frame rate up-conversion of video data
By using object-based MC-FRUC technology, multiple reference frames and variable block sizes are used to process occluded areas, solving the motion jitter and blurring problems caused by occlusion areas in existing technologies, thus improving video quality and viewing experience.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BEIJING DAJIA INTERNET INFORMATION TECH CO LTD
- Filing Date
- 2021-08-18
- Publication Date
- 2026-06-30
AI Technical Summary
Existing frame rate upconversion methods can easily cause motion jitter or blurring of moving objects when dealing with occluded areas, especially when dealing with combined occluded areas, thus affecting video quality.
The object-based MC-FRUC technique is employed, using multiple reference frames and variable block size to generate a target object map through motion vector classification, and then project it onto multiple reference frames to detect and process occluded areas.
It improves the visual quality of video data, reduces motion jitter and blur of moving objects, and enhances the video viewing experience.
Smart Images

Figure CN114079775B_ABST
Abstract
Description
[0001] Cross-reference to related applications
[0002] This application claims priority to U.S. Application No. 63 / 068,984, filed August 21, 2020, entitled "System and apparatus for frame rate upconversion", the entire contents of which are incorporated herein by reference. Technical Field
[0003] This disclosure relates to the field of video processing, and more specifically to methods and systems for performing frame rate upconversion (FRUC) of video data using multiple reference frames and variable block sizes. Background Technology
[0004] FRUC can be used to improve the visual quality of video data by converting input video with a lower frame rate to output video with a higher frame rate. For example, an input video with 30 frames per second (fps) can be converted to an output video with 60 fps, 120 fps, or other higher frame rates. Compared to the input video, the output video with a higher frame rate provides users with smoother motion and a more enjoyable viewing experience.
[0005] FRUC can also be used in low-bandwidth applications. For example, some frames in a video can be dropped during the encoding process on the transmitter side, allowing the video to be transmitted with lower bandwidth. The dropped frames can then be regenerated through interpolation during the decoding process on the receiver side. For instance, the frame rate of the video can be halved by dropping every other frame during the encoding process on the transmitter side, and then the frame rate can be restored on the receiver side using FRUC through frame interpolation.
[0006] Existing FRUC methods can be broadly classified into three categories. The first category uses multiple received video frames to interpolate additional frames without considering complex motion models. Frame repetition and frame averaging are two typical examples of this category. In frame repetition, the frame rate is increased by simply repeating or copying received frames. In frame averaging, additional frames are interpolated by a weighted average of multiple received frames. Given the simplicity of these methods, their drawbacks are also apparent, including motion jitter or blurring of moving objects when the video content contains such objects. The second category (so-called motion-compensated FRUC (MC-FRUC)) is more advanced because it utilizes motion information to perform motion compensation (MC) to generate interpolated frames. The third category utilizes neural networks. For example, synthetic networks can be trained and developed to generate interpolated frames using neural networks and deep learning. Motion field information derived using conventional motion estimation or deep learning-based methods can also be fed into the network for frame interpolation.
[0007] In existing FRUC methods, when a block is detected as "covered and uncovered," there is no suitable reference frame to perform motion compensation because there is no correct reference block / pixel in either previous or subsequent reference frames. While hole-filling methods may be helpful in some cases, properly handling this situation remains one of the most challenging aspects of FRUC.
[0008] This disclosure provides an improved method and system for using multiple reference frames and variable block sizes in MC-FRUC. Summary of the Invention
[0009] Embodiments of this disclosure provide a method for performing frame rate upconversion on video data comprising a sequence of image frames. The method may include determining a set of motion vectors for a target frame relative to a plurality of reference frames by a video processor. The target frame is generated and inserted into the sequence of image frames. The method may further include performing motion vector classification on the set of motion vectors by the video processor to generate a target object map for the target frame. The method may further include projecting the target object map onto the plurality of reference frames based on the set of motion vectors by the video processor to generate a plurality of reference object maps. The method may further include detecting occlusion regions in the target frame by the video processor based on the set of motion vectors, the target object map, and the plurality of reference object maps.
[0010] Embodiments of this disclosure also provide a system for performing frame rate upconversion on video data comprising a sequence of image frames. The system may include a memory configured to store the sequence of image frames. The system may also include a video processor configured to determine a set of motion vectors for a target frame relative to a plurality of reference frames. The target frame is generated and inserted into the sequence of image frames. The video processor may also be configured to perform motion vector classification on the set of motion vectors to generate a target object map for the target frame. The video processor may be further configured to project the target object map onto the plurality of reference frames based on the set of motion vectors to generate a plurality of reference object maps. The video processor may be further configured to detect occlusion regions in the target frame based on the set of motion vectors, the target object map, and the plurality of reference object maps.
[0011] Embodiments of the present invention also provide a non-transitory computer-readable storage medium configured to store instructions, which, when executed by a video processor, cause the video processor to perform a process for performing frame rate upconversion on video data comprising a sequence of image frames. The video processing may include determining a set of motion vectors for a target frame relative to a plurality of reference frames. The target frame is generated and inserted into the sequence of image frames. The video processing may further include performing motion vector classification on the set of motion vectors to generate a target object map for the target frame. The video processing may further include projecting the target object map onto the plurality of reference frames based on the set of motion vectors to generate a plurality of reference object maps. The video processing may further include detecting occlusion regions in the target frame based on the set of motion vectors, the target object map, and the plurality of reference object maps.
[0012] It should be understood that the foregoing general description and the following detailed description are merely exemplary and illustrative, and do not limit the invention as claimed. Attached Figure Description
[0013] Figure 1 A block diagram of an exemplary system for performing FRUC for video data according to embodiments of the present disclosure is shown.
[0014] Figure 2A A block diagram illustrating an exemplary process for performing FRUC on video data according to embodiments of the present disclosure.
[0015] Figure 2B This is a graphical representation illustrating the interpolation process of a target frame based on a plurality of reference frames according to an embodiment of the present disclosure.
[0016] Figure 3 This is a flowchart of an exemplary method for performing FRUC on video data according to embodiments of the present disclosure.
[0017] Figure 4 This is a flowchart of an exemplary method for determining a set of motion vectors of a target frame relative to a plurality of reference frames, according to embodiments of the present disclosure.
[0018] Figure 5 This is a flowchart of an exemplary method for generating a target object graph for a target frame, according to embodiments of the present disclosure.
[0019] Figures 6A to 6B This is a flowchart of an exemplary method for performing occlusion detection on a target block according to embodiments of the present disclosure.
[0020] Figure 7 This is a graphical representation illustrating a bidirectional matching motion estimation process according to an embodiment of the present disclosure.
[0021] Figure 8A This is a graphical representation illustrating the forward motion estimation process according to an embodiment of the present disclosure.
[0022] Figure 8B This is a graphical representation illustrating the backward motion estimation process according to an embodiment of the present disclosure.
[0023] Figure 9 This is a graphical representation illustrating an exemplary motion vector scaling process according to an embodiment of the present disclosure.
[0024] Figure 10A This is a graphical representation illustrating a process for generating an exemplary target object diagram according to embodiments of the present disclosure.
[0025] Figures 10B to 10D This illustrates an embodiment of the present disclosure for use based on... Figure 10A A graphical representation of the process of generating an exemplary reference object graph from a target object graph.
[0026] Figure 10E This illustrates an embodiment of the present disclosure for use based on... Figure 10A The target object graph is a graphical representation of the process of determining exemplary occlusion detection results for a target block.
[0027] Figure 11A This is a graphical representation illustrating a process for determining a first occlusion detection result for a target block according to an embodiment of the present disclosure.
[0028] Figure 11B This illustrates an embodiment of the present disclosure for determining [a specific target]. Figure 11A A graphical representation of the process of detecting the second occlusion of the target block. Detailed Implementation
[0029] Reference will now be made in detail to exemplary embodiments, examples of which are shown in the accompanying drawings. Wherever possible, the same reference numerals will be used throughout the drawings to refer to the same or similar parts.
[0030] MC-FRUC technology may include inserting additional frames into a video using motion compensation of moving objects. Motion compensation can be performed using motion information of moving objects, resulting in interpolated frames with smoother motion. Generally, an MC-FRUC system may include a motion estimation module, an occlusion detector, and a motion compensation module. The motion estimation module determines the motion vector of the interpolated frame (also referred to herein as the target frame) relative to one or more reference frames based on a distortion metric. The occlusion detector detects whether an occlusion scene occurs in the target frame. In response to detecting an occlusion scene, the occlusion detector determines the occlusion region of the occluded scene in the target frame.
[0031] In some implementations, the occlusion detector can detect unoccluded regions, occluded regions, or both in the target frame through motion trajectory tracking. The motion compensation module can generate image content (or pixel values) for the unoccluded regions by referencing both the most recent previous frame (a reference frame immediately preceding the target frame) and the most recent subsequent frame (a reference frame immediately following the target frame). Occluded regions can include, for example, covered occluded regions, uncovered occluded regions, or a combination of occluded regions. For each of the covered and uncovered occluded regions, the motion compensation module can generate image content (or pixel values) for said region in the target frame by referencing either the most recent previous frame or the most recent subsequent frame. Overlapping Block Motion Compensation (OBMC) technology can also be used to reduce block artifacts and improve visual quality.
[0032] For example, suppose a region (e.g., multiple pixels or pixel blocks) in a target frame is detected as having an "occluded" state relative to the most recent previous and subsequent frames. This means that the region is exposed in the most recent previous frame but is covered by one or more other objects in the most recent subsequent frame. This region can be referred to as an occluded region. For each target block in this region, no matching block (or matching pixel) for the target block can be found in the most recent subsequent frame. Only the corresponding reference block (or corresponding pixel block) in the most recent previous frame can be identified as a matching block and used for motion compensation of the target block.
[0033] In another example, suppose a region in the target frame is detected as having an "uncovered" occlusion state, meaning that the region was covered in the most recent previous frame but revealed in the most recent subsequent frame. This region can be referred to as an uncovered occluded region. For each target block in this region, no matching block for the target block can be found in the most recent previous frame. Only the corresponding reference block in the most recent subsequent frame can be identified as a matching block and used for motion compensation of the target block.
[0034] In another example, suppose a region is detected as having a combined occlusion state (e.g., a "covered and uncovered" occlusion state), meaning that the region is covered (not exposed) in both the most recent previous frame and the most recent subsequent frame. This region can be referred to as a combined occlusion region. For example, the region is covered by one or more first objects in the most recent previous frame and also by one or more second objects in the most recent subsequent frame, making the region not exposed in either the most recent previous frame or the most recent subsequent frame. For each target block in this region, no matching block for the target block can be found from either the most recent previous frame or the most recent subsequent frame. In this case, additional processing may be needed to interpolate the pixels in the target block. For example, hole-filling methods such as spatial interpolation (e.g., image inpainting) could be used to fill the region.
[0035] However, because a matching block for each target block within a combined occluded region cannot be found in the most recent previous and subsequent frames, motion jitter or blurring of the moving objects can occur if the image content of the combined occluded region includes moving objects with complex motion. The video viewing experience may be degraded due to motion jitter or blurring of moving objects. Appropriate handling of occluded regions in target frames (especially combined occluded regions) is a challenge in FRUC to improve the visual quality of video data.
[0036] In this disclosure, an object-based MC-FRUC technique is provided. More specifically, a system and method for performing FRUC on video data using multiple reference frames and variable block sizes are disclosed. The object-based MC-FRUC technique described herein can properly handle occlusion regions of a target frame using multiple reference frames instead of just two recent reference frames (such as the most recent previous frame and the most recent subsequent frame).
[0037] For example, for a target block included in a combined (“covered and uncovered”) occluded region, since no matching block for the target block can be found from the two nearest reference frames, the object-based MC-FRUC technique described herein references additional reference frames (instead of just the two nearest reference frames). The object-based MC-FRUC technique described herein obtains one or more matching blocks for the target block from the additional reference frames. In this case, the target block is no longer classified as a combined occluded state and can be removed from the combined occluded region. Depending on the number of one or more matching blocks and in which one or more additional reference frames one or more matching blocks can be found, the target block can be converted into an unoccluded target block, a covered occluded target block, or an uncovered occluded target block. As a result, the image content (or pixels) of the target block can be generated based on one or more matching blocks, making it possible to reduce or eliminate potential motion jitter or blur of moving objects in the video data. This improves the visual quality of the video data.
[0038] According to this disclosure, the object-based MC-FRUC technology disclosed herein can improve the video rendering performance of a video processor (or video processing computer). Therefore, the video viewing experience provided through the video processor (or video processing computer) or through a display coupled to the video processor (or video processing computer) can be enhanced. For example, video display quality can be improved by reducing potential motion jitter or blurring of moving objects that may occur during FRUC. Motion artifacts in video data can be reduced, allowing the processor (or computer) to display video with smoother motion.
[0039] According to this disclosure, the object-based MC-FRUC technique disclosed herein provides a specific and detailed solution for improving video display quality when applying FRUC. Specifically, occlusion detection of a target frame can be improved (or refined) based on the motion vector set, the target object map, and the multiple reference object maps by a series of operations including (1) performing motion vector classification on the motion vector set of the target frame to generate a target object map for the target frame and (2) projecting the target object map onto multiple reference frames to generate multiple reference object maps for the multiple reference frames. For example, for an occluded target block that is “covered and uncovered” and for which no matching block was found in the two most recent previous and subsequent frames, more reference frames can be used to determine one or more matching blocks for the target block, so that the image content of the target block can be generated based on one or more matching blocks to reduce potential motion artifacts. A further description of this specific and detailed solution for improving video display quality when applying FRUC is provided below.
[0040] Figure 1 A block diagram of an exemplary system 101 for performing FRUC for video data according to embodiments of the present disclosure is shown. In some embodiments, system 101 may be implemented on a device with which user 112 can interact. For example, system 101 may be implemented on a server (e.g., a local server or a cloud server), workstation, game station, desktop computer, laptop computer, tablet computer, smartphone, game controller, wearable electronic device, television (TV) set, or any other suitable electronic device.
[0041] In some embodiments, system 101 may include at least one processor (such as processor 102), at least one memory (such as memory 103), and at least one storage device (such as storage device 104). It should be understood that system 101 may also include any other suitable components for performing the functions described herein.
[0042] In some embodiments, system 101 may have different modules (such as integrated circuit (IC) chips) in a single device or separate devices with dedicated functions. For example, the IC may be implemented as an application-specific integrated circuit (ASIC) or a field-programmable gate array (FPGA). In some embodiments, one or more components of system 101 may be located in a cloud computing environment or optionally in one location or distributed locations. The components of system 101 may be in an integrated device or distributed in different locations but communicate with each other via a network (not shown in the figure).
[0043] Processor 102 may include any suitable type of microprocessor, graphics processor, digital signal processor, or microcontroller suitable for video processing. Processor 102 may include one or more hardware units (e.g., portions or portions of an integrated circuit) designed to be used with other components or designed to perform a portion of a video processing program. The program may be stored on a computer-readable medium and, when executed by processor 102, may perform one or more functions. Processor 102 may be configured as a separate processor module dedicated to performing FRUC. Alternatively, processor 102 may be configured as a shared processor module for performing other functions unrelated to performing FRUC.
[0044] In some embodiments, processor 102 may be a dedicated processor tailored for video processing. For example, processor 102 may be a graphics processing unit (GPU), which is a dedicated electronic circuit designed to rapidly manipulate and modify memory to accelerate the creation of images in a frame buffer intended for output to a display device. The functionality disclosed herein may be implemented by a GPU. In another example, system 101 may be implemented in a system-on-a-chip (SoC), and processor 102 may be a media and pixel processing (MPP) processor configured to run a video encoder or decoder application. In some embodiments, the functionality disclosed herein may be implemented by an MPP processor.
[0045] The processor 102 may include several modules, such as a motion estimation module 105, an occlusion detector 107, and a motion compensation module 109. Although Figure 1 The motion estimation module 105, occlusion detector 107, and motion compensation module 109 are shown to be within a processor 102, but they may be implemented on different processors that are close to or far from each other.
[0046] The motion estimation module 105, occlusion detector 107, and motion compensation module 109 (and any corresponding submodules or subunits) may be hardware units of the processor 102 designed to be used with other components (e.g., portions of an integrated circuit) or software units implemented by the processor 102 by executing at least a portion of a program. The program may be stored on a computer-readable medium such as memory 103 or storage device 104, and when executed by the processor 102, it may perform one or more functions.
[0047] Memory 103 and storage device 104 may include any suitable type of mass storage device configured to store any type of information that processor 102 may need to operate. For example, memory 103 and storage device 104 may be volatile or non-volatile, magnetic, semiconductor-based, magnetic tape-based, optical, removable, non-removable, or other types of storage devices or tangible (i.e., non-transitory) computer-readable media, including but not limited to ROM, flash memory, dynamic RAM, and static RAM. Memory 103 and / or storage device 104 may be configured to store one or more computer programs that can be run by processor 102 to perform the functions disclosed herein. For example, memory 103 and / or storage device 104 may be configured to store programs that can be run by processor 102 to execute FRUC. Memory 103 and / or storage device 104 may also be configured to store information and data used by processor 102.
[0048] Figure 2A A block diagram is shown of an exemplary process 200 for performing FRUC for video data according to an embodiment of the present disclosure. Figure 2B This is a graphical representation illustrating an interpolation process 250 of a target frame (e.g., target frame 204) based on a plurality of reference frames according to an embodiment of the present disclosure. The video data may include a sequence of image frames, and the target frame 204 may be an interpolated frame to be inserted into the sequence of image frames. (Referring to reference...) Figures 2A to 2B The object-based MC-FRUC technique disclosed herein can be implemented to generate a target frame 204 using multiple reference frames 202. The multiple reference frames 202 may include multiple original image frames from video data that can be used for the generation and interpolation of the target frame 204.
[0049] For example, such as Figure 2B As shown, the plurality of reference frames 202 may include a first previous frame 202a before the target frame 204, a first subsequent frame 202b after the target frame 204, a second previous frame 202c before the first previous frame 202a, and a second subsequent frame 202d after the first subsequent frame 202b. Although in Figure 2B Four reference frames are shown, but the number of reference frames used for generating and interpolating the target frame 204 can vary depending on the specific application. The target frame 204 can be located temporally at position i in display order (or timestamp), where i is a positive integer. The second previous frame 202c, the first previous frame 202a, the first subsequent frame 202b, and the second subsequent frame 202d can be located at positions i-3, i-1, i+1, and i+3 in display order, respectively. Although not shown in... Figure 2B As shown, additional target frames can also be interpolated at positions i-4, i-2, i+2, i+4, etc., respectively.
[0050] In some embodiments, the target frame 204 may be divided into multiple target blocks, each having a size of N×M pixels, where N and M are positive integers. N indicates the number of pixels in the target block along the vertical direction, and M indicates the number of pixels in the target block along the horizontal direction. In some embodiments, each of the multiple target blocks may have a variable block size (e.g., the block size is not fixed and may vary depending on the specific application). Similarly, each reference frame 202 may be divided into multiple reference blocks, each having a size of N×M pixels.
[0051] refer to Figure 2A The motion estimation module 105 can be configured to receive a plurality of reference frames 202 and determine a set of motion vectors for the target frame 204 relative to the plurality of reference frames 202. For example, for each target block in the target frame 204, the motion estimation module 105 can determine a plurality of motion vectors for the target block relative to the plurality of reference frames 202, as described in more detail below.
[0052] In some embodiments, the plurality of reference frames 202 may include a first preceding frame (e.g., a first preceding frame 202a immediately preceding the target frame 204) and a first subsequent frame (e.g., a first subsequent frame 202b immediately following the target frame 204) that precedes the target frame 204. For each target block in the target frame 204, the motion estimation module 105 may determine the motion vector of the target block relative to the first preceding frame and the motion vector of the target block relative to the first subsequent frame.
[0053] For example, refer to Figure 2B For target block 212 of target frame 204, motion estimation module 105 can use the following reference Figure 7 , Figure 8A or Figure 8B The exemplary motion estimation technique described herein determines the motion vector 222 of the target block 212 relative to the first previous frame 202a and the motion vector 224 of the target block 212 relative to the first subsequent frame 202b.
[0054] In some embodiments, the plurality of reference frames 202 may further include one or more second prior frames preceding the first prior frame (e.g., a second prior frame 202c immediately preceding the first prior frame 202a) and one or more second subsequent frames following the first subsequent frame (e.g., a second subsequent frame 202d immediately following the first subsequent frame 202b). For each target block in the target frame 204, the motion estimation module 105 may also be configured to scale the motion vector of the target block relative to the first prior frame to generate a corresponding motion vector of the target block relative to each second prior frame. Furthermore, the motion estimation module 105 may also be configured to scale the motion vector of the target block relative to the first subsequent frame to generate a corresponding motion vector of the target block relative to each second subsequent frame.
[0055] For example, refer to Figure 2B The motion estimation module 105 can scale the motion vector 222 of the target block 212 relative to the first previous frame 202a to generate a motion vector 226 of the target block 212 relative to the second previous frame 202c. Furthermore, the motion estimation module 105 can scale the motion vector 224 of the target block 212 relative to the first subsequent frame 202b to generate a motion vector 228 of the target block 212 relative to the second subsequent frame 202d. (Refer to the following...) Figure 9 The exemplary motion vector scaling process is described in more detail.
[0056] The occlusion detector 107 can be configured to receive a set of motion vectors of the target frame 204 from the motion estimation module 105 and perform motion vector classification on the set of motion vectors to generate a target object map for the target frame 204, as described in more detail below.
[0057] In some embodiments, the occlusion detector 107 may perform motion vector classification on a set of motion vectors to detect one or more objects in the target frame 204. For example, the occlusion detector 107 may classify the set of motion vectors into one or more groups of motion vectors. In this case, similar motion vectors (e.g., motion vectors with the same or similar velocities) may be classified into the same group. For example, the k-nearest neighbor (k-NN) algorithm may be used to perform motion vector classification. Then, for each group of motion vectors, the occlusion detector 107 may determine one or more target blocks from the target frame 204, where each target block has its own motion vector classified into the group of motion vectors. The occlusion detector 107 may determine the objects corresponding to the groups of motion vectors as image regions in the target frame 204 that include one or more target blocks. By performing a similar operation on each group of motion vectors, the occlusion detector 107 may determine one or more objects corresponding to one or more groups of motion vectors.
[0058] According to this disclosure, if the difference between the velocities of two motion vectors is within a predetermined threshold, then the two motion vectors can be considered similar motion vectors. For example, if the angular difference and amplitude difference between the velocities of two motion vectors are within a predetermined angular threshold and a predetermined amplitude threshold, respectively, then the two motion vectors can be considered similar motion vectors. The predetermined angular threshold can be ±5%, ±10%, ±15%, or other suitable values. The predetermined amplitude threshold can be ±5%, ±10%, ±15%, or other suitable values.
[0059] According to this disclosure, an object can be an image region in an image frame that has the same or similar motion vectors. The objects disclosed herein may include multiple real-world objects. For example, if multiple real-world objects have zero motion vectors, these real-world objects can be detected as background objects in an object map.
[0060] In some embodiments, the occlusion detector 107 may generate a target object map for the target frame 204 to include one or more objects detected in the target frame 204. For example, the target object map may depict one or more objects and indicate which of the one or more objects each target block of the target frame 204 belongs to. Referring below... Figure 10A The generation of the exemplary target object graph is described in more detail.
[0061] In some embodiments, the occlusion detector 107 may determine one or more relative depth values for one or more objects in a target object map. For example, these relative depth values may be determined based on one or more features of the objects. Object features may include, for example, the object's size (e.g., indicated by a region), the average magnitude of the object's motion vector, etc. The relative depth values of one or more objects may be used as a metric to indicate which object is relatively closer to the camera. Specifically, a smaller relative depth value for an object indicates that the object is closer to the camera than another object with a larger relative depth value.
[0062] In some embodiments, the object with the largest area in the target frame 204 may be identified as the background area (background object) and assigned a maximum relative depth value. Any other object detected in the target frame 204 may be assigned a relative depth value less than that of the background object. For example, one or more other objects detected in the target frame 204 may be assigned the same relative depth value less than that of the background object. In another example, one or more other objects detected in the target frame 204 may be assigned one or more different relative depth values less than that of the background object. When any other object overlaps with the background object, it can be determined that the other object covers the background object.
[0063] Since each object can be assigned a relative depth value, target blocks included in the same object are assigned the object's relative depth value. In other words, each target block included in an object can have the same relative depth value as the object. Therefore, the target object map of target frame 204 can be used to indicate the corresponding relative depth value of each target block in target frame 204. That is, the corresponding relative depth value of each target block can be found from the target object map, which is useful for determining the occlusion detection result of the target block, as described in more detail below.
[0064] In some embodiments, after determining the relative depth values of all objects in the target object map, the interpolation of pixels in the occluded region of the target frame 204 can be processed accordingly. For example, when multiple objects overlap in a region of the target frame 204, it can be determined that the object with the smallest depth value covers all other objects it overlaps with. That is, only the pixels of the object with the smallest relative depth value can be used to interpolate the pixels of this occluded region in the target frame 204.
[0065] The occlusion detector 107 can also be configured to detect whether an occlusion scene occurs in the target frame 204, at least in part, based on the motion vector set and the target object map of the target frame 204. In response to detecting an occlusion scene, the occlusion detector 107 can detect occlusion regions in the target frame 204, as described in more detail below.
[0066] In some embodiments, the target frame 204 may include an unoccluded region, an occluded region, or both. An unoccluded region may be an image region in the target frame 204 that includes one or more unoccluded target blocks. An occluded region may be an image region in the target frame 204 that includes one or more occluded target blocks. An occluded region may include one or more covered occluded regions, uncovered occluded regions, and combined occluded regions. A covered occluded region may include one or more occluded target blocks having a covered occluded state. An uncovered occluded region may include one or more occluded target blocks having an uncovered occluded state. A combined occluded region may include one or more occluded target blocks having a combined occluded state. A combined occluded state may be a combination of covered and uncovered occluded states (e.g., a "covered and uncovered" state). The detection of unoccluded regions, covered occluded regions, uncovered occluded regions, or combined occluded regions in the target frame 204 is described in more detail below.
[0067] In some embodiments, the occlusion detector 107 may perform an object projection process to project a target object map onto a plurality of reference frames 202 based on the motion vector set of the target frame 204, and generate a plurality of reference object maps for the plurality of reference frames 202.
[0068] For example, for each reference frame 202, the occlusion detector 107 can project each object of the target frame 204 onto the reference frame 202 to generate an object projection on the reference frame 202. Specifically, the occlusion detector 107 can project each target block of an object onto the reference frame 202 based on the motion vector of the target block relative to the reference frame 202 to generate a block projection of the target block. The block projections of all target blocks of the object can then be generated and aggregated to form an object projection of the object. By performing a similar operation to project each object identified in the target object map onto the reference frame 202, the occlusion detector 107 can generate one or more object projections for one or more objects on the reference frame 202.
[0069] For an image region in reference frame 202 that is only covered by the object projection, occlusion detector 107 can determine that the image region of reference frame 202 is covered by an object associated with the object projection. As a result, the object is identified in the reference object map of reference frame 202. Each reference block in the image region may have the same relative depth value as the object.
[0070] Optionally or additionally, for an image region where two or more object projections of reference frame 202 overlap, the object projection associated with the object having the smaller (or smallest) relative depth value is selected. For example, the two or more object projections are each associated with two or more objects. The occlusion detector 107 can determine a set of relative depth values associated with the two or more objects from the target object map and the minimum relative depth value in that set. The occlusion detector 107 can identify the object projection associated with the object having the minimum relative depth value from the two or more object projections. The object with the smaller (or smallest) relative depth value may be equivalent to the object with the minimum relative depth value among the two or more objects.
[0071] The occlusion detector 107 can determine that an image region of the reference frame 202 is covered by an object with a small (or minimum) relative depth value. As a result, the object with the small (or minimum) relative depth value can be identified in the reference object map of the reference frame 202. Each reference block in the image region can have the same relative depth value as the object in the reference object map. See also... Figures 10B to 10D The generation of the exemplary reference object diagram is described in more detail.
[0072] In another example, for each reference frame 202, the occlusion detector 107 can project multiple target blocks onto the reference frame 202 based on motion vectors of multiple target blocks relative to the reference frame 202, to generate multiple block projections. That is, the occlusion detector 107 can project each target block onto the reference frame 202 based on motion vectors of the target blocks relative to the reference frame 202 to generate block projections. The occlusion detector 107 can combine the multiple block projections at least partially based on a target object map to generate a reference object map of the reference frame 202. Specifically, for a reference block in the reference frame 202 that is only covered by the block projection of a target block, the occlusion detector 107 can determine that the reference block is covered by an object associated with the target block. As a result, the object associated with the target block is identified in the reference object map of the reference frame 202. The reference block may have the same relative depth value as the object.
[0073] Optionally or additionally, for a reference block of reference frame 202 that overlaps with two or more block projections of two or more target blocks, the block projection associated with the target block having the smaller (or smallest) relative depth value is selected. For example, two or more block projections are each associated with two or more target blocks. The occlusion detector 107 can determine a set of relative depth values associated with two or more target blocks in the target object map and the minimum relative depth value in that set. The occlusion detector 107 can identify the block projection associated with the target block having the minimum relative depth value from the two or more block projections. The target block with the smaller (or smallest) relative depth value may be equivalent to the target block with the minimum relative depth value among the two or more target blocks.
[0074] The occlusion detector 107 can determine that a reference block is covered by an object associated with a target block having a smaller (or minimum) relative depth value. As a result, the object associated with the target block having the smaller (or minimum) relative depth value is identified in the reference object map of reference frame 202. The reference block may have the same relative depth value as the target block having the smaller (or minimum) relative depth value.
[0075] As a result, a reference object map for reference frame 202 can be generated. Multiple reference blocks in reference frame 202 can be identified as being associated with one or more objects identified in the reference object map. Note that the objects identified in the reference object map may be the same as or different from the objects identified in the target object map. For example, some objects identified in the target object map may not exist in the reference object map. In another example, all objects identified in the target object map may exist in the reference object map. Since each object identified in the reference object map can be associated with a relative depth value, reference blocks included in the same object can be associated with the same relative depth value of that object. Therefore, the reference object map can be used to indicate the corresponding relative depth value of each reference block in reference frame 202. For example, the corresponding relative depth value of each reference block can be found from the reference object map, which is useful for determining the occlusion detection result of the target block, as described in more detail below.
[0076] In some embodiments, the occlusion detector 107 may detect occlusion regions in the target frame 204 based on a motion vector set, a target object map, and multiple reference object maps for multiple reference frames 202. For example, the occlusion detector 107 may detect a set of occluded target blocks from multiple target blocks in the target frame 204 and generate an occlusion region for the target frame 204 including the set of occluded target blocks.
[0077] In some implementations, the plurality of reference frames 202 may include a first preceding frame before the target frame 204 and a first subsequent frame after the target frame 204, and the plurality of reference object maps for the plurality of reference frames 202 may include a first preceding object map for the first preceding frame and a first subsequent object map for the first subsequent frame. For each target block in the target frame 204, the occlusion detector 107 may determine a first occlusion detection result for the target block. The first occlusion detection result may indicate whether the target block is an occluded target block relative to the first preceding frame and the first subsequent frame.
[0078] For example, the occlusion detector 107 may determine a first preceding block in the first preceding frame corresponding to the target block based on the motion vector of the target block relative to the first previous frame. The occlusion detector 107 may determine the relative depth value of the first preceding block based on the first preceding object map. Next, the occlusion detector 107 may determine a first subsequent block in the first subsequent frame corresponding to the target block based on the motion vector of the target block relative to the first subsequent frame. The occlusion detector 107 may determine the relative depth value of the first subsequent block based on the first subsequent object map. Then, the occlusion detector 107 may determine a first occlusion detection result for the target block based on the relative depth value of the target block, the relative depth value of the first preceding block, and the relative depth value of the first subsequent block.
[0079] If the relative depth value of the target block is not greater than the relative depth value of the first preceding block but greater than the relative depth value of the first subsequent block (e.g., satisfying the occlusion condition), then the occlusion detector 107 can determine that the target block is an occluded target block with an occluded state relative to the first preceding frame and the first subsequent frame. For example, the target block can be an occluded target block with an occluded state relative to the first preceding frame and the first subsequent frame, such that the target block is exposed in the first preceding frame but is covered by an object with a smaller relative depth value in the first subsequent frame. The matching block of the target block can be the first preceding block in the first preceding frame.
[0080] If the relative depth value of the target block is greater than the relative depth value of the first preceding block but not greater than the relative depth value of the first subsequent block (e.g., satisfying the occlusion condition of not being covered), then the occlusion detector 107 can determine that the target block is an occluded target block with an occluded state of not being covered relative to the first preceding frame and the first subsequent frame. For example, the target block can be an occluded target block that is not covered relative to the first preceding frame and the first subsequent frame, such that the target block is covered by an object with a smaller relative depth value in the first preceding frame but is exposed in the first subsequent frame. The matching block of the target block can be the first subsequent block in the first subsequent frame.
[0081] If the relative depth value of the target block is greater than the relative depth value of the first preceding block and also greater than the relative depth value of the first subsequent block (e.g., satisfying a combined occlusion condition), then the occlusion detector 107 determines that the target block is an occluded target block having a combined occlusion state relative to the first preceding frame and the first subsequent frame. For example, the target block can be an occluded target block with a combination of the first preceding frame and the first subsequent frame, such that the target block is covered by a first object in the first preceding frame and a second object in the first subsequent frame. Each of the first object and the second object can have a relative depth value less than the relative depth value of the target block. The first object and the second object can be the same object or different objects. No matching block for the target block can be found from the first preceding frame and the first subsequent frame.
[0082] Otherwise (e.g., no occlusion condition is met, no occlusion condition is met, and no combination of occlusion conditions is met), the occlusion detector 107 may determine that the target block is an unoccluded target block. For example, the target block is exposed in a previous frame and a subsequent frame. Matching blocks for the target block may include a first previous block in a first previous frame and a first subsequent block in a first subsequent frame.
[0083] In other words, the occlusion detector 107 can determine whether the target block is an unoccluded target block, a covered occluded target block, an uncovered occluded target block, or a combination of occluded target blocks based on the following expression (1):
[0084]
[0085]
[0086] In the above expression (1), k represents the index of the target block, occlusion(k, P1, N1) represents the first occlusion detection result of target block k relative to the first previous frame P1 and the first subsequent frame N1, and D k D represents the relative depth value of the target block k. R(k,P1) Let D represent the relative depth value of the first previous block R(k, P1) corresponding to the target block k in the first previous frame P1, and D R(k,N1) This represents the relative depth value of the first subsequent block R(k, N1) in the first subsequent frame N1 corresponding to the target block k. The first previous block R(k, P1) can be determined by projecting the target block k onto the first previous frame P1 based on the motion vector of the target block k relative to the first previous frame P1. The first subsequent block R(k, N1) can also be determined by projecting the target block k onto the first subsequent frame N1 based on the motion vector of the target block k relative to the first subsequent frame N1.
[0087] In the above expression (1), the "covered" result indicates that the target block k is a covered occluded target block, and a matching block of the target block k can be found in the first previous frame P1, which is the first previous block R(k, P1). The "not covered" result indicates that the target block k is an uncovered occluded target block, and a matching block of the target block k can be found in the first subsequent frame N1, which is the first subsequent block R(k, N1). The "combined" result indicates that the target block k is a combined occluded target block, and a matching block of the target block k cannot be found in the first previous frame P1 and the first subsequent frame N1. The "not occluded" result indicates that the target block k is an unoccluded target block, and two matching blocks of the target block k can be found in the first previous frame P1 and the first subsequent frame N1, which include the first previous block R(k, P1) and the first subsequent block R(k, N1).
[0088] Based on the above expression (1), the relative depth values of the target block k and its corresponding reference blocks R(k, P1) and R(k, N1) can be compared to determine whether the target block k is occluded in the corresponding reference frames P1 and N1. Then, the result of "covered", "not covered", "combined", or "not occluded" can be determined based on whether the target block k is occluded when projected onto the reference frames P1 and N1.
[0089] By performing a similar operation on each target block in target frame 204, occlusion detector 107 can determine multiple first occlusion detection results for multiple target blocks. Based on the multiple first occlusion detection results, occlusion detector 107 can determine from the multiple target blocks one or more unoccluded target blocks relative to the first previous frame and the first subsequent frame, one or more covered occluded target blocks, one or more uncovered occluded target blocks, and / or one or more combinations of occluded target blocks. Then, occlusion detector 107 can determine unoccluded regions including one or more unoccluded target blocks, covered occluded regions including one or more covered occluded target blocks, uncovered occluded regions including one or more uncovered occluded target blocks, and / or combinations of occluded target blocks.
[0090] In some implementations, the plurality of reference frames 202 may further include a second previous frame preceding the first previous frame and a second subsequent frame following the first subsequent frame. The plurality of reference object maps may also include a second previous object map for the second previous frame and a second subsequent object map for the second subsequent frame. To further improve the interpolation results of the combined occlusion region of the target frames 204, the occlusion detector 107 may determine a second occlusion detection result for each target block within the combined occlusion region. The second occlusion detection result may indicate whether the target block is an occluded target block relative to the second previous frame and the second subsequent frame.
[0091] Specifically, for each target block identified as an occluded target block relative to the first previous frame and the first subsequent frame (e.g., no matching block is found from the first previous frame and the first subsequent frame) within the combined occluded region, the occlusion detector 107 may further determine whether the target block has any matching blocks from more reference frames (e.g., in addition to the first previous frame and the first subsequent frame). By using more reference frames, the FRUC results of the video data can be improved. For example, for each target block within the combined occluded region, the occlusion detector 107 may determine whether the target block is an unoccluded target block relative to the second previous frame and the second subsequent frame, an covered occluded target block, an uncovered occluded target block, or a combined occluded target block based on the following expression (2):
[0092]
[0093] In the above expression (2), k represents the index of the target block, occlusion(k, P2, N2) represents the second occlusion detection result of the target block k relative to the second previous frame P2 and the second subsequent frame N2, and D R(k,P2) D represents the relative depth value of the second previous block R(k, P2) corresponding to the target block k in the second previous frame P2. R(k,N2)This represents the relative depth value of the second subsequent block R(k, N2) in the second subsequent frame N2 corresponding to the target block k. The second previous block R(k, P2) can be determined by projecting the target block k onto the second previous frame P2 based on the motion vector of the target block k relative to the second previous frame P2. The second subsequent block R(k, N2) can also be determined by projecting the target block k onto the second subsequent frame N2 based on the motion vector of the target block k relative to the second subsequent frame N2.
[0094] In expression (2) above, the "covered" result indicates that target block k is a covered occluded target block, and a matching block of target block k can be found in the second previous frame P2, which is the second previous block R(k, P2). The "not covered" result indicates that target block k is an uncovered occluded target block, and a matching block of target block k can be found in the second subsequent frame N2, which is the second subsequent block R(k, N2). The "combined" result indicates that target block k is a combined occluded target block, and a matching block of target block k cannot be found in the second previous frame P2 and the second subsequent frame N2. The "not occluded" result indicates that target block k is an unoccluded target block, and two matching blocks of target block k can be found in the second previous frame P2 and the second subsequent frame N2, respectively, which include the second previous block R(k, P2) and the second subsequent block R(k, N2).
[0095] As a result, the occlusion detector 107 can determine one or more second occlusion detection results for one or more target blocks included in the combined occlusion region. Based on one or more second occlusion detection results, the occlusion detector 107 can determine one or more unoccluded target blocks, one or more covered occluded target blocks, one or more uncovered occluded target blocks, and / or one or more combined occluded target blocks relative to the second previous frame and the second subsequent frame from one or more target blocks in the combined occlusion region.
[0096] Then, the occlusion detector 107 may update the unoccluded area to further include one or more unoccluded target blocks relative to the second previous frame and the second subsequent frame. Optionally or additionally, the occlusion detector 107 may update the covered occluded area to further include one or more covered occluded target blocks relative to the second previous frame and the second subsequent frame. Optionally or additionally, the occlusion detector 107 may update the uncovered occluded area to further include one or more uncovered occluded target blocks relative to the second previous frame and the second subsequent frame.
[0097] Optionally or additionally, the occlusion detector 107 may also update the combined occlusion region to include only one or more combined occlusion target blocks relative to the second previous frame and the second subsequent frame. That is, one or more unoccluded target blocks, one or more covered occlusion target blocks, and / or one or more uncovered occlusion target blocks relative to the second previous frame and the second subsequent frame may be removed from the combined occlusion region because matching blocks for these target blocks can be found in the second previous frame or the second subsequent frame, or in both the second previous frame and the second subsequent frame. The updated combined occlusion region includes only one or more target blocks having a combined occlusion state relative to the first previous frame and the first subsequent frame, and the second previous frame and the second subsequent frame.
[0098] Furthermore, for each remaining target block within the combined occlusion region, the occlusion detector 107 can also determine a third (or fourth, fifth, ...) occlusion detection result for the target block relative to a third (or fourth, fifth, ...) previous frame preceding the second previous frame and a third (or fourth, fifth, ...) subsequent frame following the second subsequent frame. The description of determining additional occlusion detection results is similar to those used to describe the first and second occlusion detection results, and therefore will not be repeated here. Occlusion detection of the target frame 204 can be improved by using more reference frames.
[0099] According to the disclosure herein, the above expressions (1) or (2) can be extended and generalized to allow for the flexible use of different reference frames to determine the occlusion detection result for the target block k. For example, the occlusion detector 107 can determine whether the target block is an unoccluded target block relative to the i-th previous frame Pi and the j-th subsequent frame Nj, a covered occluded target block, an uncovered occluded target block, or a combination of occluded target blocks based on the following expression (3):
[0100]
[0101] In the above expression (3), occlusion(k, Pi, Nj) represents the occlusion detection result of target block k relative to the i-th previous frame Pi and the j-th subsequent frame Nj, where i and j are positive integers. R(k,Pi) D represents the relative depth value of the i-th previous block R(k, Pi) in the i-th previous frame Pi, corresponding to the target block k. R(k,Nj) Let R(k, Nj) represent the relative depth value of the j-th subsequent block R(k, Nj) in the j-th subsequent frame Nj corresponding to the target block k. The i-th previous block R(k, Pi) can be determined by projecting the target block k onto the i-th previous frame Pi based on the motion vector of the target block k relative to the i-th previous frame Pi. The j-th subsequent frame R(k, Nj) can be determined by projecting the target block k onto the j-th subsequent frame Nj based on the motion vector of the target block k relative to the j-th subsequent frame Nj.
[0102] In the above expression (3), the "covered" result indicates that the target block k is a covered occluded target block, and a matching block of the target block k can be found in the i-th previous frame Pi, namely the i-th previous block R(k, Pi). The "not covered" result indicates that the target block k is an uncovered occluded target block, and a matching block of the target block k can be found in the j-th subsequent frame Nj, namely the j-th subsequent block R(k, Nj). The "combined" result indicates that the target block k is a combined occluded target block, and no matching block of the target block k can be found in the i-th previous frame Pi and the j-th subsequent frame Nj. The "not occluded" result indicates that the target block k is an unoccluded target block, and two matching blocks of the target block k can be found in the i-th previous frame Pi and the j-th subsequent frame Nj, respectively, which include the i-th previous block R(k, Pi) and the j-th subsequent block R(k, Nj).
[0103] The motion compensation module 109 can be configured to receive a set of motion vectors of the target frame 204 from the motion estimation module 105 and receive occlusion regions detected for the target frame 204 from the occlusion detector 107. The motion compensation module 109 can generate image content of the target frame 204 from multiple reference frames 202 based on the set of motion vectors and the occlusion regions of the target frame 204.
[0104] In some embodiments, the target frame 204 may include an unoccluded region. For each target block within the unoccluded region, the motion compensation module 109 may project the target block onto the plurality of reference frames 202 based on the motion vector of the target block relative to the plurality of reference frames 202 to determine a matching block from the plurality of reference frames 202. If the motion vector has sub-pixel precision, an interpolation filtering process may be used to generate the matching block. The motion compensation module 109 may then generate the image content of the target block by performing a weighted average of the image content of the matching block. For example, a pixel at a specific pixel location in the target block may be equal to the weighted average of pixels at the same pixel location in the matching block.
[0105] For example, refer to Figure 2B Assuming target block 212 is an unoccluded target block relative to the first previous frame 202a and the first subsequent frame 202b, motion compensation module 109 can project target block 212 onto the first previous frame 202a based on motion vector 222 to obtain matching block 214, and project target block 212 onto the first subsequent frame 202b based on motion vector 224 to obtain matching block 218. Motion compensation module 109 can generate the image content of target block 212 by performing a weighted average operation on the image content of matching block 214 and the image content of matching block 218.
[0106] In another example, assume that target block 212 is an occluded target block relative to the combination of the first previous frame 202a and the first subsequent frame 202b, and an unoccluded target block relative to the second previous frame 202c and the second subsequent frame 202d. Motion compensation module 109 can project target block 212 onto the second previous frame 202c based on motion vector 226 to obtain matching block 216, and project target block 212 onto the second subsequent frame 202d based on motion vector 228 to obtain matching block 220. Motion compensation module 109 can generate the image content of target block 212 by performing a weighted average operation on the image content of matching block 216 and matching block 220.
[0107] In yet another example, assume that target block 212 is an unoccluded target block relative to the first previous frame 202a and the first subsequent frame 202b, as well as the second previous frame 202c and the second subsequent frame 202d. Motion compensation module 109 can generate the image content of target block 212 by performing a weighted average operation on the image content of matching blocks 214, 216, 218, and 220.
[0108] In some embodiments, the target frame 204 may include a covered occlusion region. For each target block within the covered occlusion region, the motion compensation module 109 may project the target block onto one or more previous frames based on one or more motion vectors of the target block relative to one or more previous frames to determine one or more matching blocks for the target block from one or more previous frames. The motion compensation module 109 may generate the image content of the target block by performing a weighted average operation on the image content of one or more matching blocks.
[0109] For example, refer to Figure 2B Assuming target block 212 is an occluded target block covered relative to the first previous frame 202a and the first subsequent frame 202b, motion compensation module 109 can project target block 212 onto the first previous frame 202a based on motion vector 222 to obtain matching block 214. Motion compensation module 109 can generate image content of target block 212 based on image content of matching block 214 (e.g., image content of target block can be the same as image content of matching block).
[0110] In another example, suppose target block 212 is an occluded target block relative to a combination of a first previous frame 202a and a first subsequent frame 202b, and an overlaid occluded target block relative to a second previous frame 202c and a second subsequent frame 202d. Motion compensation module 109 may project target block 212 onto second previous frame 202c based on motion vector 226 to obtain matching block 216. Motion compensation module 109 may generate image content of target block 212 based on image content of matching block 216.
[0111] In another example, assume that target block 212 is an occluded target block covered relative to the first previous frame 202a and the first subsequent frame 202b, as well as the second previous frame 202c and the second subsequent frame 202d. Motion compensation module 109 can project target block 212 onto the first previous frame 202a and the second previous frame 202c based on motion vectors 222 and 226, respectively, to obtain matching blocks 214 and 216. Motion compensation module 109 can generate the image content of target block 212 by performing a weighted average operation on the image content of matching blocks 214 and 216.
[0112] In some embodiments, the target frame 204 may include uncovered occluded regions. For each target block within an uncovered occluded region, the motion compensation module 109 may project the target block into one or more subsequent frames based on the target block's motion vectors relative to one or more subsequent frames to determine one or more matching blocks for the target block from one or more subsequent frames. The motion compensation module 109 then generates the image content of the target block by performing a weighted average operation on the image content of one or more matching blocks.
[0113] For example, refer to Figure 2B Assuming target block 212 is an uncovered target block relative to the first previous frame 202a and the first subsequent frame 202b, motion compensation module 109 can project target block 212 onto the first subsequent frame 202b based on motion vector 224 to obtain matching block 218. Motion compensation module 109 can generate image content of target block 212 based on image content of matching block 218.
[0114] In another example, suppose target block 212 is an occluded target block relative to the combination of a first previous frame 202a and a first subsequent frame 202b, and an uncovered occluded target block relative to a second previous frame 202c and a second subsequent frame 202d. Motion compensation module 109 may project target block 212 onto the second subsequent frame 202d based on motion vector 228 to obtain matching block 220. Motion compensation module 109 may generate image content of target block 212 based on image content of matching block 220.
[0115] In yet another example, assume that target block 212 is an uncovered occluded target block relative to the first previous frame 202a and the first subsequent frame 202b, as well as the second previous frame 202c and the second subsequent frame 202d. Motion compensation module 109 can generate the image content of target block 212 by performing a weighted average operation on the image content of matching blocks 218 and 220.
[0116] In some embodiments, the target frame 204 may include a combination of occluded regions. For each target block within the combined occluded region, a matching block for the target block cannot be found from the multiple reference frames 202. In this case, additional processing may be required to interpolate the pixels in the target block. For example, a hole-filling method such as spatial interpolation (e.g., image inpainting) can be used to fill the pixels in the target block. In another example, the target block can be generated by copying a co-location block from a first previous frame or a first subsequent frame. The co-location block can be obtained by projecting the target block onto the first previous frame or the first subsequent frame using a zero motion vector. In yet another example, the target block can be derived by a weighted average of the co-location blocks from both the first previous frame and the first subsequent frame.
[0117] Optionally, for each target block in the combined occlusion region, additional reference frames can be introduced into multiple reference frames 202, allowing operations similar to those described above for the reference occlusion detector 107 and motion compensation module 109 to search for one or more matching blocks in the additional reference frames. Similar descriptions will not be repeated here. Then, if one or more matching blocks for the target block can be found in the additional reference frames, the image content of the target block can be generated based on the image content of the one or more matching blocks.
[0118] Figure 3 This is a flowchart of an exemplary method 300 for performing FRUC on video data according to embodiments of the present disclosure. Method 300 may be implemented by system 101 (specifically, motion estimation module 105 and occlusion detector 107) and may include steps 302 to 308 as described below. Some steps may be optional in performing the disclosure provided herein. Furthermore, some steps may be performed concurrently or in conjunction with... Figure 3 The different execution sequences are shown.
[0119] In step 302, motion estimation module 105 determines a set of motion vectors for the target frame relative to a plurality of reference frames. For example, the target frame may be divided into multiple target blocks. For each target block in the target frame, motion estimation module 105 determines multiple motion vectors for that target block relative to the plurality of reference frames. In another example, motion estimation module 105 may perform the following... Figure 4 The operations described are similar to those used to determine the set of motion vectors.
[0120] In step 304, the occlusion detector 107 performs motion vector classification on the motion vector set to generate a target object map for the target frame. For example, the occlusion detector 107 may perform the following... (refer to...) Figure 5 The described operations are similar to those used to generate a target object graph.
[0121] In step 306, the occlusion detector 107 projects a target object map onto multiple reference frames based on a set of motion vectors to generate multiple reference object maps. For example, for each reference frame, the occlusion detector 107 projects multiple target blocks onto the reference frame based on motion vectors of multiple target blocks relative to that reference frame to generate multiple block projections. The occlusion detector 107 combines the multiple block projections to generate a reference object map for each reference frame. The occlusion detector 107 determines one or more relative depth values for one or more objects identified in the reference object maps. As a result, the occlusion detector 107 can generate multiple reference object maps for multiple reference frames, respectively.
[0122] In step 308, the occlusion detector 107 detects occlusion regions in the target frame based on a motion vector set, a target object map, and multiple reference object maps. For example, the occlusion detector 107 may perform the following... Figures 6A to 6B The described operation is similar to the operation used to determine one or more occlusion detection results for each target block. The occlusion detector 107 can determine the occlusion region in the target frame based on the occlusion detection results of multiple target blocks in the target frame.
[0123] Figure 4 This is a flowchart of an exemplary method 400 for determining a set of motion vectors for a target frame relative to a plurality of reference frames, according to embodiments of the present disclosure. Method 400 may be implemented by system 101 (specifically, motion estimation module 105) and may include steps 402 to 410 as described below. Some steps may be optional in performing the disclosure provided herein. Furthermore, some steps may be performed concurrently or in conjunction with… Figure 4 The different execution sequences are shown.
[0124] In some embodiments, the plurality of reference frames may include a first prior frame preceding the target frame, one or more second prior frames preceding the first prior frame, a first subsequent frame following the target frame, and one or more second subsequent frames following the first subsequent frame.
[0125] In step 402, the motion estimation module 105 divides the target frame into multiple target blocks.
[0126] In step 404, the motion estimation module 105 selects the target block to be processed from multiple target blocks.
[0127] In step 406, motion estimation module 105 determines the motion vector of the target block relative to the first previous frame and the motion vector of the target block relative to the first subsequent frame. For example, motion estimation module 105 may utilize the following reference... Figures 7 to 8B The described bidirectional matching motion estimation technique, forward motion estimation technique, or backward motion estimation technique is used to determine the motion vector of the target block relative to a first previous frame and a first subsequent frame.
[0128] In step 408, for each second previous frame, the motion estimation module 105 scales the motion vector of the target block relative to the first previous frame to generate the motion vector of the target block relative to the second previous frame.
[0129] In step 409, for each second subsequent frame, the motion estimation module 105 scales the motion vector of the target block relative to the first subsequent frame to generate the motion vector of the target block relative to the second subsequent frame.
[0130] In step 410, motion estimation module 105 determines whether any remaining target blocks exist among the plurality of target blocks to be processed. In response to the presence of at least one remaining target block to be processed, method 400 may return to step 404 to select the remaining target blocks, making the selected target blocks available for processing. Otherwise, method 400 terminates since all target blocks in the target frame have been processed.
[0131] Figure 5 This is a flowchart of an exemplary method 500 for generating a target object map for a target frame, according to embodiments of the present disclosure. Method 500 may be implemented by system 101 (specifically, occlusion detector 107) and may include steps 502 to 508 as described below. Some steps may be optional in performing the disclosure provided herein. Furthermore, some steps may be performed concurrently or in conjunction with... Figure 5 The different execution sequences are shown.
[0132] In step 502, the occlusion detector 107 classifies the motion vector set of the target frame into one or more motion vector groups.
[0133] In step 504, for each group of motion vectors, the occlusion detector 107 determines the object corresponding to the group of motion vectors. As a result, the occlusion detector 107 determines one or more objects for one or more groups of motion vectors.
[0134] In step 506, the occlusion detector 107 generates a target object map to include one or more objects.
[0135] In step 508, the occlusion detector 107 determines one or more relative depth values for one or more objects in the target object map.
[0136] Figures 6A to 6B This is a flowchart of an exemplary method 600 for performing occlusion detection on a target block according to embodiments of the present disclosure. Method 600 may be implemented by system 101 (specifically, occlusion detector 107) and may include steps 602 to 622 as described below. Some steps may be optional in performing the disclosure provided herein. Furthermore, some steps may be performed concurrently or in conjunction with... Figures 6A to 6B The different execution sequences shown are illustrated.
[0137] In some embodiments, the plurality of reference frames used herein may include a plurality of previous reference frames, such as a first previous frame preceding the target frame, a second previous frame preceding the first previous frame, a third previous frame preceding the second previous frame, and so on. The plurality of reference frames may also include a plurality of subsequent frames, such as a first subsequent frame following the target frame, a second subsequent frame following the first subsequent frame, a third subsequent frame following the second subsequent frame, and so on.
[0138] Reference Figure 6A In step 602, the occlusion detector 107 determines the first previous block in the first previous frame corresponding to the target block based on the motion vector of the target block relative to the first previous frame.
[0139] In step 604, the occlusion detector 107 determines the relative depth value of the first previous block based on the first previous object map of the first previous frame.
[0140] In step 606, the occlusion detector 107 determines the first subsequent block in the first subsequent frame corresponding to the target block based on the motion vector of the target block relative to the first subsequent frame.
[0141] In step 608, the occlusion detector 107 determines the relative depth value of the first subsequent block based on the first subsequent object map of the first subsequent frame.
[0142] In step 610, the occlusion detector 107 determines a first occlusion detection result for the target block based on the relative depth value of the target block, the relative depth value of the first previous block, and the relative depth value of the first subsequent block.
[0143] In step 612, the occlusion detector 107 determines, based on the first occlusion detection result, whether the target block is an occluded target block relative to the combination of the first previous frame and the first subsequent frame. In response to the target block being an occluded target block relative to the combination of the first previous frame and the first subsequent frame, method 600 proceeds to... Figure 6B Step 614. Otherwise (e.g., the target block is an unoccluded target block relative to the first previous frame and the first subsequent frame, an occluded target block that is covered, or an occluded target block that is not covered), method 600 ends.
[0144] Reference Figure 6B In step 614, the occlusion detector 107 determines the second previous block in the second previous frame corresponding to the target block based on the motion vector of the target block relative to the second previous frame.
[0145] In step 616, the occlusion detector 107 determines the relative depth value of the second previous block based on the second previous object map of the second previous frame.
[0146] In step 618, the occlusion detector 107 determines the second subsequent block in the second subsequent frame corresponding to the target block based on the motion vector of the target block relative to the second subsequent frame.
[0147] In step 620, the occlusion detector 107 determines the relative depth value of the second subsequent block based on the second subsequent object map of the second subsequent frame.
[0148] In step 622, the occlusion detector 107 determines a second occlusion detection result for the target block based on the relative depth value of the target block, the relative depth value of the second previous block, and the relative depth value of the second subsequent block.
[0149] Additionally, the occlusion detector 107 can determine, based on the second occlusion detection result, whether the target block is still an occluded target block relative to the combination of the second previous frame and the second subsequent frame. In response to the target block being an occluded target block relative to the combination of the second previous frame and the second subsequent frame, method 600 can continue to determine a third occlusion detection result for the target block relative to the third previous frame and the third subsequent frame. Similar descriptions will not be repeated here. Otherwise (e.g., the target block is an unoccluded target block relative to the second previous frame and the second subsequent frame, a covered occluded target block, or an uncovered occluded target block), method 600 terminates.
[0150] Figure 7 This is a graphical representation illustrating a bidirectional matching motion estimation process 700 according to an embodiment of the present disclosure. In some embodiments, a block matching scheme and an optical flow scheme can be used to estimate the motion vector of a target frame, and the target frame can be interpolated along the motion trajectory of the motion vector. Block matching schemes can be easily designed with low computational complexity. Block matching schemes may include bidirectional matching motion estimation techniques, forward motion estimation techniques, or backward motion estimation techniques, etc.
[0151] The bidirectional matching motion estimation technique disclosed herein can be performed on each target block in the target frame to obtain the motion vector of the target block relative to a previous frame and the motion vector of the target block relative to a subsequent frame. In some embodiments, the previous frame and the subsequent frame can be the two reference frames closest to the target frame. For example, the previous frame can be a reference frame that immediately precedes the target frame relative to the display order (or chronological order), and the subsequent frame can be a reference frame that immediately follows the target frame relative to the display order (or chronological order). In some other embodiments, the previous frame can be any reference frame preceding the target frame, and the subsequent frame can be any reference frame following the target frame, which is not limited in the disclosure herein.
[0152] Reference Figure 7The motion estimation module 105 can use a bidirectional matching motion estimation technique to determine the motion vectors of the target block 712 in the target frame 702 relative to the previous frame 704a and the subsequent frame 704b. Specifically, the motion estimation module 105 can perform a bidirectional matching search process in the previous frame 704a and the subsequent frame 704b to determine a set of candidate motion vectors for the target block 712. The set of candidate motion vectors may include a first pair of candidate motion vectors and one or more second pairs of candidate motion vectors surrounding the first pair of candidate motion vectors. For example, the first pair of candidate motion vectors may include an initial candidate motion vector (iMV0) relative to the previous frame 704a and an initial candidate motion vector (iMV1) relative to the subsequent frame 704b. An exemplary second pair of candidate motion vectors may include a candidate motion vector (cMV0) relative to the previous frame 704a and a candidate motion vector (cMV1) relative to the subsequent frame 704b.
[0153] The candidate motion vectors in each pair can be symmetrical. For example, in the first pair, the initial candidate motion vector (iMV0) pointing to the previous frame 704a can be opposite to the initial candidate motion vector (iMV1) pointing to the subsequent frame 704b. In the second pair, the candidate motion vector (cMV0) pointing to the previous frame 704a can be opposite to the candidate motion vector (cMV1) pointing to the subsequent frame 704b. The difference between the initial candidate motion vector iMV0 and the candidate motion vector cMV0 can be called the motion vector offset and is denoted as MV_offset. For example, the following expressions (4) to (6) can be established for bidirectional matching motion estimation techniques:
[0154] cMV0 = -CMV1, (4)
[0155] cMV0 = iMV0 + MV_offset, (5)
[0156] cMV1 = iMV1 - MV_offset. (6)
[0157] For each pair of candidate motion vectors, two corresponding reference blocks (e.g., a corresponding previous block and a corresponding subsequent block) can be located from the previous frame 704a and the subsequent frame 704b, respectively. For example, for the first pair of candidate motion vectors (iMV0 and iMV1), the previous block 704 and the subsequent block 706 can be located from the previous frame 704a and the subsequent frame 704b, respectively, for the target block 712. For the second pair of candidate motion vectors (cMV0 and cMV1), the previous block 703 and the subsequent block 707 can be located from the previous frame 704a and the subsequent frame 704b, respectively, for the target block 712.
[0158] Next, for each pair of candidate motion vectors (iMV0 and iMV1, or cMV0 and cMV1), the distortion value (e.g., the sum of absolute differences (SAD) value) between the two corresponding reference blocks can be determined. Then, the pair of candidate motion vectors with the lowest distortion value (e.g., the lowest SAD value) can be determined, and this pair of candidate motion vectors is regarded as the motion vector of the target block 712 relative to the previous frame 704a and the subsequent frame 704b.
[0159] Note that when determining the motion vector of target block 712 relative to the previous frame 704a and the subsequent frame 704b, this paper uses a distortion metric such that the determined motion vector has an optimal match between the two corresponding reference blocks in the previous frame 704a and the subsequent frame 704b. Examples of distortion metrics used in this paper may include, but are not limited to, the following: SAD metric, mean squared error (MSE) metric, or mean absolute distortion (MAD) metric.
[0160] Figure 8A This is a graphical representation of a forward motion estimation process 800 according to an embodiment of the present disclosure. Figure 8B This is a graphical representation illustrating a backward motion estimation process 850 according to an embodiment of the present disclosure. The forward motion estimation technique or backward motion estimation technique disclosed herein can be performed for each target block in a target frame to obtain the motion vector of the target block relative to a previous frame and the motion vector of the target block relative to a subsequent frame. In each of the forward and backward motion estimation techniques, a different reference block is searched only in one of the two reference frames (e.g., a previous frame or a subsequent frame), while a fixed reference block is used in the other of the two reference frames.
[0161] In some embodiments, Figure 8A In the forward motion estimation technique shown, the subsequent block 818 in the subsequent frame 804b that is co-located with the target block 812 in the target frame 802 is used as a fixed corresponding reference block for the target block 812, while different previous blocks in the previous frame 804a (e.g., including previous blocks 814, 816) are selected as corresponding reference blocks for the target block 812. The distortion value between the subsequent block 818 in the subsequent frame 804b and each of the different previous blocks in the previous frame 804a can be determined. Then, the previous block with the lowest distortion value can be selected from the different previous blocks, and the motion vector from the subsequent block 818 to the selected previous block can be determined and referred to as MV. orig_FW For example, if the previous block 816 has the lowest distortion value compared to other previous blocks, then the motion vector MV orig_FW It can be the motion vector 840 pointing from the subsequent block 818 to the previous block 816.
[0162] The motion vector MV can be determined based on the time distance between the previous frame 804a and the target frame 802, and the time distance between the previous frame 804a and the subsequent frame 804b. orig_FW Scaling is performed to obtain the motion vector of target block 812 relative to the previous frame 804a. According to the disclosure provided herein, the time distance between the first and second frames can be measured as the time distance between the timestamp of the first frame and the timestamp of the second frame (or the display order). For example, the motion vector of target block 812 relative to the previous frame 804a can be calculated using expressions (7) to (8):
[0163] MV P1 (x)=MV orig_FW (x)*(T P1 -T target ) / (T P1 -T N1 (7)
[0164] MV P1 (y) = MV orig_FW (y)*(T P1 -T target ) / (T P1 -T N1 (8)
[0165] MV P1 (x) and MV P1 (y) represents the x and y components of the motion vector of target block 812 relative to the previous frame 804a, respectively. MV orig_FW (x) and MV orig_FW (y) represents the motion vector MV. orig_FW The x and y components. T P1 T N1 and T target These represent the timestamps or display order of the previous frame 804a, the subsequent frame 804b, and the target frame 802, respectively. (T P1 -T target ) and (T P1 -T N1 ) represent the time distance between the previous frame 804a and the target frame 802, and the time distance between the previous frame 804a and the subsequent frame 804b, respectively.
[0166] Then, the motion vector MV can be adjusted based on the time distance between the subsequent frame 804b and the target frame 802, and the time distance between the previous frame 804a and the subsequent frame 804b. orig_FW Scaling is performed to obtain the motion vector of target block 812 relative to subsequent frame 804b. For example, the motion vector of target block 812 relative to subsequent frame 804b can be calculated using expressions (9) to (10):
[0167] MV N1 (x)=MV orig_FW (x)*(T N1 -T target ) / (T P1 -T N1 (9)
[0168] MV N1 (y) = MV orig_FW (y)*(T N1 -T target ) / (T P1 -T N1 (10)
[0169] MV N1 (x) and MV N1 (y) represents the x and y components of the motion vector of target block 812 relative to the subsequent frame 804b, respectively. (T) N1 -T target The ) indicates the time distance between the subsequent frame 804b and the target frame 802.
[0170] In some embodiments, Figure 8B In the backward motion estimation technique shown, the previous block 862 in the previous frame 804a, which is co-located with the target block 852 in the target frame 802, is used as a fixed corresponding reference block for the target block 812, while different subsequent blocks in the subsequent frame 804b (e.g., including subsequent blocks 864, 866) are used as corresponding reference blocks for the target block 812. The distortion value between the previous block 862 in the previous frame 804a and each of the different subsequent blocks in the subsequent frame 804b can be determined. Then, the subsequent block with the lowest distortion value can be selected from the different subsequent blocks, and the motion vector from the previous block 862 to the selected subsequent block can be determined and referred to as MV. orig_BW For example, if subsequent block 866 has the lowest distortion value compared to other subsequent blocks, then the motion vector MV orig_BW It can be the motion vector 880 pointing from the previous block 862 to the subsequent block 866.
[0171] The motion vector MV can be determined based on the time distance between the subsequent frame 804b and the target frame 802, and the time distance between the subsequent frame 804b and the previous frame 804a. orig_BW Scaling is performed to obtain the motion vector of the target block 812 relative to the subsequent frame 804b. For example, the motion vector of the target block 812 relative to the subsequent frame 804b can be calculated using expressions (11) to (12):
[0172] MV N1 (x)=MVorig_BW (x)*(T N1 -T target ) / (T N1 -T P1 (11)
[0173] MV N1 (y) = MV orig_BW (y)*(T N1 -T target ) / (T N1 -T P1 (12)
[0174] MV orig_BW (x) and MV orig_BW (y) represents the motion vector MV. orig_BW The x and y components. Next, the motion vector MV can be further calculated based on the time distance between the previous frame 804a and the target frame 802, and the time distance between the subsequent frame 804b and the previous frame 804a. orig_BW Scaling is performed to obtain the motion vector of target block 812 relative to the previous frame 804a. For example, the motion vector of target block 812 relative to the previous frame 804a can be calculated using expressions (13) to (14):
[0175] MV P1 (x)=MV orig_BW (x)*(T P1 -T target ) / (T N1 -T P1 (13)
[0176] MV P1 (y) = MV orig_BW (y)*(T P1 -T target ) / (T N1 -T P1 (14)
[0177] Note that when using Figure 7 and Figures 8A to 8B When determining motion vectors for a target block using the techniques described herein, in addition to the aforementioned distortion metrics, bias values can be used to derive a more consistent motion vector field. For example, the spatial correlation between the target block and its neighboring target blocks, as well as the temporal correlation between the target block and its co-located reference blocks in the reference frame, can be considered. Bias values can be calculated based on the differences between candidate motion vectors of the target block and motion vectors from those neighboring target blocks and co-located reference blocks. Bias values can be incorporated into distortion values (e.g., SAD values) to determine the total cost. The candidate motion vector with the lowest total cost can be determined as the motion vector for the target block.
[0178] Figure 9 This is a graphical representation illustrating an exemplary motion vector scaling process 900 according to an embodiment of the present disclosure. In some embodiments, when more than two reference frames are used for FRUC, the motion estimation module 105 may apply the above references. Figure 7 and Figures 8A to 8B One of the described techniques is to estimate the motion vector of each target block relative to a first previous frame and a first subsequent frame. The first previous frame and the first subsequent frame can be, for example, two most recent reference frames (e.g., the most recent previous frame and the most recent subsequent frame). The most recent previous frame can be a frame immediately preceding the target frame. The most recent subsequent frame can be a frame immediately following the target frame. The motion vector of the target block relative to other reference frames can be derived through the motion vector scaling procedure disclosed herein, without applying... Figure 7 and Figures 8A to 8B Any technology, because of Figure 7 and Figures 8A to 8B The computational demands of this technique are substantial. Note that the accuracy of the motion vectors derived through motion vector scaling can be improved by performing local motion estimation.
[0179] Reference Figure 9 The target frame 902 may be located at position i in display order. Multiple reference frames may include a first previous frame 904a and a first subsequent frame 904b located at positions i-1 and i+1 in display order, respectively. Multiple reference frames may also include another previous frame 906 and another subsequent frame 908 located at positions ik and i+j in display order, respectively, where k and j are positive integers, and k may or may not be equal to j.
[0180] Initially, it can be applied Figure 7 and Figures 8A to 8B Any technique used to determine the motion vector (denoted as MV) of target block 912 relative to the first previous frame 904a P1 The motion vector (denoted as MV) of target block 912 relative to the first subsequent frame 904b. N1 Then, the motion vector MV can be calculated based on the time distance between another previous frame 906 and the first previous frame 904a, and the time distance between the first previous frame 904a and the target frame 902. P1 Zoom to another previous frame 906 to determine the motion vector (denoted as MV) of the target block 912 relative to another previous frame 904. P2 For example, the motion vector MV of target block 912 relative to another previous frame 906. P2 It can be calculated using expressions (15) to (16):
[0181] MV P2 (x)=MVP1 (x)*(T P2 -T P1 ) / (T P1 -T target (15)
[0182] MV P2 (y) = MV P1 (y)*(T P2 -T P1 ) / (T P1 -T target (16)
[0183] MV P1 (x) and MV P1 (y) represents the motion vector MV of target block 912 relative to the first previous frame 904a. P1 The x and y components. MV P2 (x) and MV P2 (y) represents the motion vector MV of target block 912 relative to another previous frame 906. P2 The x and y components. T P2 Indicates the timestamp or display order of another previous frame 906. (T) P2 -T P1 ) indicates the time distance between another previous frame 906 and the first previous frame 904a.
[0184] Then, the motion vector MV can be calculated based on the time distance between another subsequent frame 908 and the first subsequent frame 904b, and the time distance between the first subsequent frame 904b and the target frame 902. N1 Zoom to another subsequent frame 908 to determine the motion vector (denoted as MV) of the target block 912 relative to the other subsequent frame 908. N2 For example, the motion vector MV of target block 912 relative to another subsequent frame 908. N2 It can be calculated using expressions (17) to (18):
[0185] MV N2 (x)=MV N1 (x)*(T N2 -T N1 ) / (T N1 -T target ), (17)
[0186] MV N2 (y) = MV N1 (y)*(T N2 -T N1 ) / (TN1-T target (18)
[0187] MV N1 (x) and MV N1 (y) represents the motion vector MV of target block 912 relative to the first subsequent frame 904b. N1 The x and y components. MV N2 (x) and MV N2 (y) represents the motion vector MV of target block 912 relative to another subsequent frame 908. N2 The x and y components. T N2 Indicates the timestamp or display order of another subsequent frame 908. (T) N2 -T N1 ) indicates the time distance between another subsequent frame 908 and the first subsequent frame 904b.
[0188] By performing a similar operation on each target block in target frame 902, the motion vectors of all target blocks relative to another previous frame 906 and another subsequent frame 908 can be determined via a motion vector scaling process without applying [the necessary parameters]. Figure 7 and Figures 8A to 8B Any computationally intensive technique is not suitable. Therefore, more reference frames (e.g., not just the two most recent reference frames) can be used to perform FRUC on the video data. In some embodiments, the motion compensation module 109 may adaptively use different reference frames instead of just the most recent reference frames to perform motion compensation operations. For example, the motion compensation operation performed by the motion compensation module 109 can be performed by performing a weighted average on matching blocks from multiple reference frames other than the matching blocks of the two most recent reference frames.
[0189] Figure 10A This is a graphical representation illustrating a process 1000 for generating an exemplary target object graph for a target frame according to an embodiment of the present disclosure. Figure 10A The image shows target frame 1002, previous frame 1004a, and subsequent frame 1004b. For example, suppose two target blocks (shown in image region 1003 of target frame 1002) have the same motion vector relative to previous frame 1004a (e.g., both target blocks move to the left relative to previous frame 1004a at the same speed). Other target blocks in the remaining image regions of target frame 1002 may have zero motion vector relative to previous frame 1004a. Then, the two target blocks in image region 1003 can be identified as object 1008 in target object figure 1020, and the other target blocks in the remaining image regions of target frame 1002 can be identified as background object 1024 in target object figure 1020.
[0190] In another example, two target blocks in image region 1003 may have the same motion vector relative to subsequent frame 1004b (e.g., both target blocks move to the right at the same speed relative to subsequent frame 1004b). Other target blocks in the remaining image regions of target frame 1002 may have zero motion vector relative to subsequent frame 1004b. Then, the two target blocks in image region 1003 can be identified as object 1008 in target object figure 1020, and the other target blocks in the remaining image regions of target frame 1002 can be identified as background object 1024 in target object figure 1020.
[0191] As a result, object 1008 can be identified as a moving object moving to the left in image region 1003 of target frame 1002. Background object 1024 can be identified in the remaining image region of target frame 1002. Object 1008 can be assigned a first relative depth value, and background object 1024 can be assigned a second relative depth value, wherein the first relative depth value is less than the second relative depth value. Target object graph 1020 can be generated to include object 1008 and background object 1024.
[0192] Figures 10B to 10D This illustrates an embodiment based on the present disclosure. Figure 10A Target object Figure 1020 generates targeting Figure 10A A graphical representation of an exemplary reference object diagram of the previous frame 1004a. (Refer to...) Figure 10B The occlusion detector 107 can project the background object 1024 of the target object figure 1020 onto the previous frame 1004a to generate a first object projection in the image region 1032 of the previous frame 1004a. Because the background object 1024 has zero motion vectors, the image region 1032 of the previous frame 1004a can be the same as the image region of the background object 1024 in the target object figure 1020.
[0193] Next, refer to Figure 10C The occlusion detector 107 can project the object 1008 of the target object 1020 onto the previous frame 1004a based on the motion vector of the target block within the object 1008, so as to generate a second object projection in the image region 1033 of the previous frame 1004a.
[0194] Reference Figure 10DFor image region 1033 in previous frame 1004a that overlaps with the first object projection and the second object projection, a second object projection associated with object 1008 having a smaller relative depth value than background object 1024 is selected. Occlusion detector 107 can determine that image region 1033 in previous frame 1004a is covered by object 1008. As a result, object 1008 is identified in reference object map 1038 of previous frame 1004a. Each reference block in image region 1033 may have the same relative depth value as object 1008.
[0195] For the remaining portion of image region 1032 in the previous frame 1004a that is only covered by the first object projection of background object 1024 (e.g., the remaining portion of image region 1032 = image region 1032 - image region 1033), occlusion detector 107 can determine that the remaining portion of image region 1032 is covered by background object 1024. As a result, background object 1024 is also identified in reference object figure 1038 of the previous frame 1004a. Since no object projection was generated for image region 1034 of the previous frame 1004a (e.g., ... Figure 10C As shown), image region 1034 can be filled by background object 1024. As a result, background object 1024 is identified in the remaining image region 1040 of the previous frame 1004a, except in image region 1033 (e.g., remaining image region 1040 = the entire image region of the previous frame 1004a - image region 1033). Each reference block in the remaining image region 1040 can be a portion of background object 1024 and has the same relative depth value as background object 1024.
[0196] Figure 10E This illustrates an embodiment based on the present disclosure. Figure 10A The target object in Figure 1020 is defined as a graphical representation 1050 of an exemplary occlusion detection result for a target block. For each target block in the target frame 1002, the occlusion detector 107 can determine an occlusion detection result for the target block. The occlusion detection result can indicate whether the target block is an occluded target block relative to the first previous frame 1004a and the first subsequent frame 1004b.
[0197] For example, occlusion detector 107 may determine the previous block 1054 in the previous frame 1004a corresponding to the target block 1052 based on the motion vector of the target block 1052 relative to the previous frame 1004a. Occlusion detector 107 may also determine the previous object map (e.g., from the previous frame 1004a) based on the previous object map of the previous frame 1004a. Figure 10DThe relative depth value of the previous block 1054 is determined using a reference object graph (1038) in the following example. In this example, the relative depth value of the previous block 1054 is equal to the relative depth value of the target block 1052, where the relative depth value of the target block 1052 is a second relative depth value of the background object 1024. Next, the occlusion detector 107 may determine the subsequent block 1056 in the subsequent frame 1004b corresponding to the target block 1052 based on the motion vector of the target block 1052 relative to the subsequent frame 1004b. The occlusion detector 107 may determine the relative depth value of the subsequent block 1056 based on the subsequent object graph of the subsequent frame 1004b. In this example, the relative depth value of the subsequent block 1056 is equal to the first relative depth value of the object 1008, where the first relative depth value of the object 1008 is less than the first relative depth value of the target block 1052.
[0198] Then, the occlusion detector 107 can determine the occlusion detection result of the target block 1052 based on the relative depth value of the target block 1052, the relative depth value of the previous block 1054, and the relative depth value of the subsequent block 1056. For example, since the relative depth value of the target block 1052 is no greater than the relative depth value of the previous block 1054 but greater than the relative depth value of the subsequent block 1056, the occlusion detector 107 can determine that the target block 1052 is an occluded target block relative to the previous frame 1004a and the subsequent frame 1004b. That is, the target block 1052 is exposed in the previous frame 1004a, but is covered by the object 1008 with a smaller relative depth value in the subsequent frame 1004b. The occlusion detector 107 can determine that the matching block of the target block 1052 is the previous block 1054 in the previous frame 1004a.
[0199] Figure 11A This is a graphical representation of a process 1100 for determining a first occlusion detection result for a target block according to an embodiment of the present disclosure. A first preceding frame 1104a before the target frame 1102 and a first subsequent frame 1104b after the target frame 1102 are shown. The occlusion detector 107 can generate a target object map for the target frame 1102, such that objects 1108 and 1110 and background object 1111 are identified in the target object map. For example, an object 1108 moving to the left is identified in two target blocks of the target frame 1102 and assigned a first relative depth value. An object 1110 moving to the right is identified in six target blocks of the target frame 1102 and assigned a second relative depth value. A background object 1111 with zero movement is identified in the remaining target blocks of the target frame 1102 and assigned a third relative depth value. The first relative depth value is less than the second relative depth value, and the second relative depth value is less than the third relative depth value.
[0200] The occlusion detector 107 can also generate a first prior object map for the first previous frame 1104a, such that objects 1108 and 1110, as well as background object 1111, are also identified in the first prior object map. Similarly, the occlusion detector 107 can generate a first subsequent object map for the first subsequent frame 1104b, such that objects 1108 and 1110, as well as background object 1111, are also identified in the first subsequent object map.
[0201] For each target block in target frame 1102, occlusion detector 107 can determine a first occlusion detection result for the target block. For example, target block 1112 may be covered by background object 1111 in the target object graph and may have a third relative depth value. Occlusion detector 107 can determine a first previous block 1114 in the first previous frame 1104a corresponding to target block 1112 based on the motion vector of target block 1112 relative to the first previous frame 1104a. Occlusion detector 107 can determine the relative depth value of the first previous block 1114 based on the first previous object graph. For example, since the first previous block 1114 is covered by object 1108 in the first previous object graph, the relative depth value of the first previous block 1114 is equal to the first relative depth.
[0202] Next, the occlusion detector 107 can determine the first subsequent block 1116 in the first subsequent frame 1104b corresponding to the target block 1112 based on the motion vector of the target block 1112 relative to the first subsequent frame 1104b. The occlusion detector 107 can determine the relative depth value of the first subsequent block 1116 based on the first subsequent object map. For example, since the first subsequent block 1116 is covered by object 1110 in the first subsequent object map, the relative depth value of the first subsequent block 1116 is equal to the second relative depth.
[0203] Then, the occlusion detector 107 can determine a first occlusion detection result for the target block 1112 based on the relative depth value of the target block 1112, the relative depth value of the first preceding block 1114, and the relative depth value of the first subsequent block 1116. For example, since the relative depth value of the target block 1112 is greater than the relative depth value of the first preceding block 1114 and also greater than the relative depth value of the first subsequent block 1116, the occlusion detector 107 can determine that the target block 1112 is an occluded target block relative to the combination of the first preceding frame 1104a and the first subsequent frame 1104b. No matching block for the target block 1112 can be found from the first preceding frame 1104a and the first subsequent frame 1104b.
[0204] Figure 11B This illustrates an embodiment of the present disclosure for determining [a specific target]. Figure 11AA graphical representation of the process 1150 for the second occlusion detection result of target block 1112 is shown. A second previous frame 1105a preceding the first previous frame 1104a and a second subsequent frame 1105b following the first subsequent frame 1104b are illustrated, and the second previous frame 1105a and the second subsequent frame 1105b are used to determine the second occlusion detection result for target block 1112. Occlusion detector 107 can generate a second previous object map for the second previous frame 1105a, such that object 1110 and background object 1111 are identified in the second previous object map. Similarly, occlusion detector 107 can generate a second subsequent object map for the second subsequent frame 1105b, such that objects 1108 and 1110 and background object 1111 are identified in the second subsequent object map.
[0205] The occlusion detector 107 can determine a second previous block 1118 in the second previous frame 1105a corresponding to the target block 1112 based on the motion vector of the target block 1112 relative to the second previous frame 1105a. The occlusion detector 107 can determine the relative depth value of the second previous block 1118 based on the second previous object map. For example, since the second previous block 1118 is covered by the background object 1111 in the second previous object map, the relative depth value of the second previous block 1118 is equal to the third relative depth value of the background object 1111.
[0206] Next, the occlusion detector 107 can determine the second subsequent block 1120 in the second subsequent frame 1105b corresponding to the target block 1112 based on the motion vector of the target block 1112 relative to the second subsequent frame 1105b. The occlusion detector 107 can determine the relative depth value of the second subsequent block 1120 based on the second subsequent object map. For example, since the second subsequent block 1120 is covered by the background object 1111 in the second subsequent object map, the relative depth value of the second subsequent block 1120 is equal to the third relative depth of the background object 1111.
[0207] Then, the occlusion detector 107 can determine a second occlusion detection result for the target block 1112 based on the relative depth value of the target block 1112, the relative depth value of the second previous block 1118, and the relative depth value of the second subsequent block 1120. For example, since the relative depth value of the target block is equal to the relative depth value of the second previous block 1118 and the relative depth value of the second subsequent block 1120, the occlusion detector 107 can determine that the target block 1112 is an unoccluded target block relative to the second previous frame 1105a and the second subsequent frame 1105b. The matching blocks of the target block 1112 can be determined as the second previous block 1118 and the second subsequent block 1120.
[0208] Another aspect of this disclosure relates to a non-transitory computer-readable medium storing instructions that, when executed, cause one or more processors to perform the methods described above. The computer-readable medium may include volatile or non-volatile, magnetic, semiconductor-based, magnetic tape-based, optical, removable, non-removable, or other types of computer-readable media or computer-readable storage devices. For example, the computer-readable medium may be a storage device or memory module on which computer instructions are stored, as disclosed. In some embodiments, the computer-readable medium may be a disk or flash drive on which computer instructions are stored.
[0209] It will be apparent to those skilled in the art that various modifications and variations can be made to the disclosed system and related methods. Other embodiments will be apparent to those skilled in the art in light of the description and practice of the disclosed system and related methods.
[0210] The description and examples are intended to be illustrative only, and the true scope is indicated by the appended claims and their equivalents.
Claims
1. A computer-implemented method for performing frame rate upconversion on video data comprising a sequence of image frames, comprising: The video processor determines a set of motion vectors for a target frame relative to multiple reference frames, wherein the target frame will be generated and inserted into the image frame sequence; The video processor performs motion vector classification on the motion vector set to generate a target object map for the target frame; The video processor projects the target object map onto the plurality of reference frames based on the motion vector set to generate a plurality of reference object maps; and The video processor detects occlusion regions in the target frame based on the motion vector set, the target object map, and the plurality of reference object maps. The process of detecting occlusion regions in the target frame by the video processor based on the motion vector set, the target object map, and the plurality of reference object maps includes: Based on the motion vector set, determine the reference block in the plurality of reference frames that corresponds to the target block in the target frame; The relative depth value of the target block is determined based on the target object map; The relative depth value of the reference block is determined based on the plurality of reference object maps; The occlusion detection result of the target block is determined based on the relative depth value of the target block and the relative depth value of the reference block; The occlusion region in the target frame is detected based on the occlusion detection results.
2. The method of claim 1, wherein performing motion vector classification on the motion vector set to generate a target object map for the target frame includes: Perform motion vector classification on the set of motion vectors to detect one or more objects in the target frame; Generate the target object graph to include the one or more objects; and Determine one or more relative depth values for the one or more objects in the target object graph.
3. The method of claim 2, wherein performing motion vector classification on the motion vector set to detect one or more objects in the target frame includes: The set of motion vectors is classified into one or more groups of motion vectors; and For each set of motion vectors, identify the object corresponding to that set of motion vectors.
4. The method of claim 3, wherein determining the object corresponding to the motion vector set comprises: Determine one or more target blocks of the target frame, wherein each of the one or more target blocks has a respective motion vector classified into the motion vector group; and The object is defined as an image region in the target frame that includes one or more target blocks.
5. The method of claim 1, wherein the target frame comprises a plurality of target blocks, and projecting the target object map onto the plurality of reference frames to generate a plurality of reference object maps, comprising: For each reference frame The multiple target blocks are projected onto the reference frame based on their motion vectors relative to the reference frame to generate multiple block projections. and The multiple block projections are combined based on the target object map to generate a reference object map for the reference frame.
6. The method of claim 5, wherein each of the plurality of target blocks has a variable block size.
7. The method of claim 5, wherein the reference frame is divided into a plurality of reference blocks, and the projections of the plurality of blocks are combined to generate a reference object map for the reference frame, comprising: For a reference block whose projections overlap between two or more target blocks, Determine the set of relative depth values associated with the two or more target blocks; Determine the minimum relative depth value in the set of relative depth values; Identify the block projection associated with the target block having the minimum relative depth value from the two or more block projections; It is determined that the reference block is covered by an object associated with a target block having the minimum relative depth value, such that the object is identified in the reference object map; and The relative depth value of the reference block is determined as the relative depth value of the object.
8. The method of claim 1, wherein detecting the occlusion region in the target frame comprises: Detect a set of occluded target blocks from multiple target blocks in the target frame.
9. The method of claim 8, wherein the occlusion area includes a covered occlusion area, an uncovered occlusion area, or a combination of occlusion areas, wherein the covered occlusion area includes one or more occluded target blocks having a covered occlusion state, the uncovered occlusion area includes one or more occluded target blocks having an uncovered occlusion state, and the combination of occlusion areas includes one or more occluded target blocks having a combination of occlusion states.
10. The method of claim 8, wherein The plurality of reference frames includes a first preceding frame before the target frame and a first subsequent frame after the target frame; The plurality of reference object maps include a first previous object map for the first previous frame and a first subsequent object map for the first subsequent frame; and The set of occluded target blocks to be detected includes: For each target block in the target frame, a first occlusion detection result is determined for that target block, wherein the first occlusion detection result indicates whether the target block is an occluded target block relative to the first previous frame and the first subsequent frame.
11. The method of claim 10, wherein determining a first occlusion detection result for the target block includes: Based on the motion vector of the target block relative to the first previous frame, determine the first previous block in the first previous frame that corresponds to the target block; The relative depth value of the first previous block is determined based on the first previous object graph; Based on the motion vector of the target block relative to the first subsequent frame, a first subsequent block in the first subsequent frame corresponding to the target block is determined; The relative depth value of the first subsequent block is determined based on the first subsequent object graph; and The first occlusion detection result of the target block is determined based on the relative depth value of the target block, the relative depth value of the first previous block, and the relative depth value of the first subsequent block.
12. The method of claim 11, wherein determining a first occlusion detection result for the target block includes: In response to the target block having a relative depth value that is not greater than the relative depth value of the first previous block and greater than the relative depth value of the first subsequent block, the target block is determined to be an occluded target block with an occluded state relative to the first previous frame and the first subsequent frame, wherein the matching block of the target block is the first previous block in the first previous frame.
13. The method of claim 11, wherein determining a first occlusion detection result for the target block includes: In response to the target block having a relative depth value greater than the first previous block and not greater than the first subsequent block, the target block is determined to be an occluded target block with an uncovered occlusion state relative to the first previous frame and the first subsequent frame, wherein the matching block of the target block is the first subsequent block in the first subsequent frame.
14. The method of claim 11, wherein determining a first occlusion detection result for the target block includes: In response to the target block having a relative depth value greater than the relative depth value of the first previous block and also greater than the relative depth value of the first subsequent block, the target block is determined to be an occluded target block with a combined occlusion state relative to the first previous frame and the first subsequent frame, wherein the target block does not have a matching block in the first previous frame and the first subsequent frame.
15. The method of claim 14, wherein The plurality of reference frames also includes a second previous frame preceding the first previous frame and a second subsequent frame following the first subsequent frame; The plurality of reference object maps further includes a second previous object map for the second previous frame and a second subsequent object map for the second subsequent frame; and The method further includes: A second occlusion detection result is determined for the target block, wherein the second occlusion detection result indicates whether the target block is an occluded target block relative to the second previous frame and the second subsequent frame.
16. The method of claim 1, wherein The plurality of reference frames includes a first preceding frame before the target frame and a first subsequent frame after the target frame; and Determining the motion vector set of the target frame relative to the plurality of reference frames includes: The target frame is divided into multiple target blocks; and For each target block, determine the motion vector of the target block relative to the first previous frame and the motion vector of the target block relative to the first subsequent frame.
17. The method of claim 16, wherein The plurality of reference frames also include one or more second previous frames preceding the first previous frame and one or more second subsequent frames following the first subsequent frame; and Determining the motion vector set of the target frame relative to the plurality of reference frames, respectively, further includes: For each second previous frame, the motion vector of the target block relative to the first previous frame is scaled to generate the motion vector of the target block relative to the second previous frame; and For each second subsequent frame, the motion vector of the target block relative to the first subsequent frame is scaled to generate the motion vector of the target block relative to the second subsequent frame.
18. A system for performing frame rate upconversion on video data comprising a sequence of image frames, comprising: A memory is configured to store the image frame sequence; as well as A video processor is coupled to the memory and configured to: Determine a set of motion vectors for a target frame relative to multiple reference frames, wherein the target frame will be generated and inserted into the image frame sequence; Perform motion vector classification on the set of motion vectors to generate a target object map for the target frame; The target object map is projected onto the plurality of reference frames based on the motion vector set to generate a plurality of reference object maps; and Occlusion regions in the target frame are detected based on the motion vector set, the target object map, and the plurality of reference object maps. The detection of occlusion regions in the target frame based on the motion vector set, the target object map, and the plurality of reference object maps includes: Based on the motion vector set, determine the reference block in the plurality of reference frames that corresponds to the target block in the target frame; The relative depth value of the target block is determined based on the target object map; The relative depth value of the reference block is determined based on the plurality of reference object maps; The occlusion detection result of the target block is determined based on the relative depth value of the target block and the relative depth value of the reference block; The occlusion region in the target frame is detected based on the occlusion detection results.
19. The system of claim 18, wherein, in order to perform motion vector classification on the set of motion vectors to generate a target object map for the target frame, the video processor is further configured to: Perform motion vector classification on the set of motion vectors to detect one or more objects in the target frame; Generate the target object graph to include the one or more objects; and Determine one or more relative depth values for the one or more objects in the target object graph.
20. A non-transitory computer-readable storage medium configured to store instructions, which, when executed by a video processor, cause the video processor to perform a process for performing frame rate upconversion on video data comprising a sequence of image frames, the process comprising: Determine a set of motion vectors for a target frame relative to multiple reference frames, wherein the target frame will be generated and inserted into the image frame sequence; Perform motion vector classification on the set of motion vectors to generate a target object map for the target frame; The target object map is projected onto the plurality of reference frames based on the motion vector set to generate a plurality of reference object maps; and Occlusion regions in the target frame are detected based on the motion vector set, the target object map, and the plurality of reference object maps. The detection of occlusion regions in the target frame based on the motion vector set, the target object map, and the plurality of reference object maps includes: Based on the motion vector set, determine the reference block in the plurality of reference frames that corresponds to the target block in the target frame; The relative depth value of the target block is determined based on the target object map; The relative depth value of the reference block is determined based on the plurality of reference object maps; The occlusion detection result of the target block is determined based on the relative depth value of the target block and the relative depth value of the reference block; The occlusion region in the target frame is detected based on the occlusion detection results.