Decomposing a
video sequence into VOPs is a very difficult task, and comparatively little research has been undertaken in this field.
An intrinsic problem of VOP generation is that objects of interest are not homogeneous with respect to low-level features such as color, intensity, or
optical flow.
Thus, conventional segmentation algorithms will fail to obtain meaningful partitions.
(1) it can also be seen that apparent motion is
highly sensitive to
noise because of the derivatives, which can cause largely incorrect results.
Unfortunately, we can only observe apparent motion.
In addition to the difficulties mentioned above,
motion estimation algorithms have to solve the so-called
occlusion and aperture problems.
The
occlusion problem refers to the fact that no correspondence vectors exist for covered and uncovered background.
The aperture problem states that the number of unknowns is larger than the number of observations.
1. Nonparametric representation, in which a dense field is estimated where each pixel is assigned a correspondence or flow vector. Block matching is then applied, where the current frame is subdivided into blocks of
equal size, and for each block the best match in the next (or previous) frame is computed. All pixels of a block are assumed to undergo the same translation, and are assigned the same correspondence vector. The selection of the
block size is crucial. Block matching is unable to cope with rotations and deformations. Nevertheless, their simplicity and relative robustness make it a popular technique. Nonparametric representations are not suitable for segmentation, because an object moving in the 3-D space generates a spatially varying 2-D
motion field even within the same region, except for the simple case of pure translation. This is the reason why parametric models are commonly used in segmentation algorithms. However, dense field
estimation is often the first step in calculating the
model parameters.
2. Parametric models require a segmentation of the scene, which is our ultimate goal, and describe the motion of each region by a set of a few parameters. The motion vectors can then be synthesized from these
model parameters. A parametric representation is more compact than a dense field description, and less sensitive to
noise, because many pixels are treated jointly to estimate a few parameters.
Although parametric representations are less
noise sensitive, they still suffer from the intrinsic problems of
motion estimation.
The major drawbacks of this proposal are the computational complexity, and the need to specify the number of objects likely to be found.
The techniques of Adiv, Bouthemy and Francois, and Murray and Buxton, include only
optical flow data into the segmentation decision, and hence, their performance is limited by the accuracy of the estimated flow field.
These results are not good since we get over-segmentation, and the method is computationally expensive.
These approaches suffer from high computational complexity, and many algorithms need the number of objects or regions in the scene as an input parameter.
On the other hand, these approaches suffer from high computational complexity, and many algorithms need the number of objects or regions in the scene as an input parameter.
The result is an over-segmentation.
A drawback of this technique is the lack of temporal correspondence to enforce continuity in time.
However, due to its nature, the
watershed algorithm suffers from the problems associated with region-growth techniques.
Thus, the above techniques will fail in many practical situations where objects do not correspond to partitions based on
simple features like motion or color.
If in a sequence different views of the same object are shown, it is not possible to represent that object by a
single image that is warped from frame to frame.
Further, the affine transformation (6) might not be able to describe the motion of a complete layer in the presence of strongly non-
rigid motion such as a person walking.
Finally, the layer construction process makes real-time execution impossible, because a longer sequence of frames is required.
It is not accurate and the segments are too big.
Optical flow or motion fields could be used, but they are extremely noise sensitive, and their accuracy is limited due to the aperture and
occlusion problem.
Decomposing video sequences into VOPs is in many cases very difficult.
Partitioning a
video sequence into VOPs by means of automatic or semiautomatic segmentation is a very challenging task.
An intrinsic problem of VOP generation is that objects of interest are not homogeneous with respect to low-level features, such as color, intensity, or
optical flow.
Hence, conventional low-level segmentation algorithms will fail to obtain meaningful partitions.
At the moment, we are not aware of any
algorithm that can automatically perform VOP segmentation accurately and reliably for generic video sequences.
The main difficulty is to formulate semantic concepts in a form suitable for a segmentation algorithm.
There are some major drawbacks of CDMs for VOP segmentation.
The estimated flow field on the other hand, demonstrates how difficult it can be to group pixels into objects based on the similarity of their flow vectors.
However, transmission channels frequently add corrupting noise and have limited bandwidth (such as cellular phones
wireless networking).
However, the foregoing MPEG compression methods result in a number of unacceptable artifacts such as blockiness and unnatural
object motion when operated at very-low-bit-rates.
Usually these block boundaries do not correspond to physical boundaries of the moving objects and hence visually annoying artifacts result.
Unnatural motion arises when the limited bandwidth forces the
frame rate to fall below that required for smooth motion.
The compressed video data is then transmitted over communication channels, which are prone to errors.
Unless suitably dealt with, this can result in noticeable degradation of the picture quality.
However, error protection schemes come with the price of an increased
bit rate.
Moreover, it is not possible to correct all possible errors using a given error-control code.
In fact, a typical channel, such as a
wireless channel, over which compressed video is transmitted is characterized by high random bit error rates (BER) and multiple burst errors.
Problems arise when codes are used over channels prone to burst errors because the errors tend to be clustered in a small number of received symbols.
None of these prior art patents are capable of robust and stable automatic object extraction and segmentation.