Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Automatic object extraction

Active Publication Date: 2006-08-01
F POSZAT HU
View PDF3 Cites 207 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

[0043]The proposals in (Mech and Wollborn, 1997, Neri et al., Signal Processing, 1998) employ change detection masks and create one object for each area in the frame that is moving differently from the background. A spatial morphological segmentation technique is presented in. Choi et al., / m2091, 1997. The foreground / background decision is also made based on a CDM. To this end, regions for which a majority of pixels are classified as changed are assigned to the foreground. In Choi et al / m3349, 1998, and Gu and. Lee, ICIP'97, 1997, a user initially has to select objects in the scene by manual segmentation. These VOPs are then tracked and updated in successive frames. The usefulness of user interaction to incorporate high-level information has also been reported in Colonnese and Russo, / m3320, 1998. The performance of the segmentation algorithm is improved by letting a user tune a few crucial parameters on a frame-by-frame basis. In addition, the user is able to select an area containing the object of interest. This allows the algorithm to estimate critical parameters only on the region with the object instead of the whole image that might consist of several regions with different characteristics.Video Compression
[0061]The present invention successfully addresses the shortcomings of the presently known configurations by providing a method of automatic object extraction for segmentation of video frames that is automatic, robust, and independent of the nature of the video images. The method of the present invention is based on algorithms that are fast, do not consume a lot of computer resources, do not depend on predefined parameters and data, and do not produce over-segmentation. The method and algorithms of the present invention thus enable and provide: adaptive bit allocation for video compression, interactive TV, efficient image representation, quality of service (QoS) and differentiated services (DifferServ) over diverse communication networks (narrow and broad band), video streaming, surveillance, gaming, web caching, video mail and unified messaging. In addition, working with objects enables application of transform codings that are not based on square blocks of pixels such as 8×8 or 16×16, but use different sizes and shapes of blocks to cover the image, thus reflecting the activities in the image through the locations and shapes of the objects. The main problem in the current compression methods (MPEG-1,2,4, H263, H26L) lies in the fact that blocks are chosen independently from their relations to the nature of the pixels. Thus, a single block can belong to both the border of the object and to the background. Therefore, while the background is changing, each movement of the object will lead to a poor block matching. The present invention enables to differentiate between background (static) and foreground (moving objects). While staying in the framework of the standard and in order to overcome this problem, we can reorder the blocks in the following way: a block will be placed on either the object or the background. This will be accomplished by assigning different block sizes. Outside the MPEG-4 framework, segmented and / or extracted objects can be used to automate editing work, e.g. in interactive television. The segmentation and object extraction techniques will serve as a starting point for commercial MPEG-7 multimedia databases. The ability to separate between background and foreground enables to perform a better automatic calibration and lighting correction between frames. Every video camera has a sensitive sensor that measures the amount of needed light for every frame, and then tunes the camera parameters (exposure time, shutter size). One of the major problems in a video shooting is its inability to determine which area will be sampled by the sensors. Sampling a non-object area can lead to a very poor object exposure in the final photo. Therefore, in most of the cameras, it has been assumed that calibration is performed on the center of the frame. This is an obvious problem that degrades the quality of the final images.

Problems solved by technology

Decomposing a video sequence into VOPs is a very difficult task, and comparatively little research has been undertaken in this field.
An intrinsic problem of VOP generation is that objects of interest are not homogeneous with respect to low-level features such as color, intensity, or optical flow.
Thus, conventional segmentation algorithms will fail to obtain meaningful partitions.
(1) it can also be seen that apparent motion is highly sensitive to noise because of the derivatives, which can cause largely incorrect results.
Unfortunately, we can only observe apparent motion.
In addition to the difficulties mentioned above, motion estimation algorithms have to solve the so-called occlusion and aperture problems.
The occlusion problem refers to the fact that no correspondence vectors exist for covered and uncovered background.
The aperture problem states that the number of unknowns is larger than the number of observations.
1. Nonparametric representation, in which a dense field is estimated where each pixel is assigned a correspondence or flow vector. Block matching is then applied, where the current frame is subdivided into blocks of equal size, and for each block the best match in the next (or previous) frame is computed. All pixels of a block are assumed to undergo the same translation, and are assigned the same correspondence vector. The selection of the block size is crucial. Block matching is unable to cope with rotations and deformations. Nevertheless, their simplicity and relative robustness make it a popular technique. Nonparametric representations are not suitable for segmentation, because an object moving in the 3-D space generates a spatially varying 2-D motion field even within the same region, except for the simple case of pure translation. This is the reason why parametric models are commonly used in segmentation algorithms. However, dense field estimation is often the first step in calculating the model parameters.
2. Parametric models require a segmentation of the scene, which is our ultimate goal, and describe the motion of each region by a set of a few parameters. The motion vectors can then be synthesized from these model parameters. A parametric representation is more compact than a dense field description, and less sensitive to noise, because many pixels are treated jointly to estimate a few parameters.
Although parametric representations are less noise sensitive, they still suffer from the intrinsic problems of motion estimation.
The major drawbacks of this proposal are the computational complexity, and the need to specify the number of objects likely to be found.
The techniques of Adiv, Bouthemy and Francois, and Murray and Buxton, include only optical flow data into the segmentation decision, and hence, their performance is limited by the accuracy of the estimated flow field.
These results are not good since we get over-segmentation, and the method is computationally expensive.
These approaches suffer from high computational complexity, and many algorithms need the number of objects or regions in the scene as an input parameter.
On the other hand, these approaches suffer from high computational complexity, and many algorithms need the number of objects or regions in the scene as an input parameter.
The result is an over-segmentation.
A drawback of this technique is the lack of temporal correspondence to enforce continuity in time.
However, due to its nature, the watershed algorithm suffers from the problems associated with region-growth techniques.
Thus, the above techniques will fail in many practical situations where objects do not correspond to partitions based on simple features like motion or color.
If in a sequence different views of the same object are shown, it is not possible to represent that object by a single image that is warped from frame to frame.
Further, the affine transformation (6) might not be able to describe the motion of a complete layer in the presence of strongly non-rigid motion such as a person walking.
Finally, the layer construction process makes real-time execution impossible, because a longer sequence of frames is required.
It is not accurate and the segments are too big.
Optical flow or motion fields could be used, but they are extremely noise sensitive, and their accuracy is limited due to the aperture and occlusion problem.
Decomposing video sequences into VOPs is in many cases very difficult.
Manual segmentation, on the other hand, is often too time consuming.
Partitioning a video sequence into VOPs by means of automatic or semiautomatic segmentation is a very challenging task.
An intrinsic problem of VOP generation is that objects of interest are not homogeneous with respect to low-level features, such as color, intensity, or optical flow.
Hence, conventional low-level segmentation algorithms will fail to obtain meaningful partitions.
At the moment, we are not aware of any algorithm that can automatically perform VOP segmentation accurately and reliably for generic video sequences.
The main difficulty is to formulate semantic concepts in a form suitable for a segmentation algorithm.
There are some major drawbacks of CDMs for VOP segmentation.
The estimated flow field on the other hand, demonstrates how difficult it can be to group pixels into objects based on the similarity of their flow vectors.
However, transmission channels frequently add corrupting noise and have limited bandwidth (such as cellular phones wireless networking).
However, the foregoing MPEG compression methods result in a number of unacceptable artifacts such as blockiness and unnatural object motion when operated at very-low-bit-rates.
Usually these block boundaries do not correspond to physical boundaries of the moving objects and hence visually annoying artifacts result.
Unnatural motion arises when the limited bandwidth forces the frame rate to fall below that required for smooth motion.
The compressed video data is then transmitted over communication channels, which are prone to errors.
For video coding schemes that exploit temporal correlation in the video data, channel errors result in the decoder losing synchronization with the encoder.
Unless suitably dealt with, this can result in noticeable degradation of the picture quality.
However, error protection schemes come with the price of an increased bit rate.
Moreover, it is not possible to correct all possible errors using a given error-control code.
In fact, a typical channel, such as a wireless channel, over which compressed video is transmitted is characterized by high random bit error rates (BER) and multiple burst errors.
Problems arise when codes are used over channels prone to burst errors because the errors tend to be clustered in a small number of received symbols.
The method is not fully automatic and requires user interaction.
None of these prior art patents are capable of robust and stable automatic object extraction and segmentation.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic object extraction
  • Automatic object extraction
  • Automatic object extraction

Examples

Experimental program
Comparison scheme
Effect test

case i

[0071] After a “YES” decision, at least two (“first” and “second”) input, preferably colored video frames, I1 and I2 are first read in a “frame reading” step 26. While the method can work with two, three, four, etc. frames, we will henceforth refer to two, “first and second” frames in the sense of “at least two” frames. This is followed by a reciprocal illumination flattening (correction) step 28 of I1 and I2, which is preferably performed by a relative re-normalization of the pixel values through a smoothing operation that preserves edges. Step 28 yields smoothed frames (outputs) I1c and I2c respectively. Next, a statistical model-based “change detection” step 30 is applied between smoothed frames I1c and I2c. This generates a difference image or output (in float) I12D 34, that shows moving objects. This is followed by a local adaptive thresholding step 36 on image I12D that may include a region-growing substep. The region-growing is based on this local threshold. A binary image I1...

case ii

[0072] After a “NO” decision, at least two input, preferably colored video frames, I1 and I2 are first read in a “frame reading” step 26′, identical with step 26 in Case I. This is followed by a reciprocal illumination flattening (correction) 28′ of I1 and I2, identical with step 28. Step 28′ yields smoothed frames (outputs) I1c and I2c respectively. In parallel with step 28′, and unlike in Case I, an edge detection step 48 is applied on I1 to produce a first edge image I1E. Next, a statistical model-based “change detection” step 30′ identical with step 30 is applied between smoothed frames I1c and I2c. This generates a difference image or output (in float) I12D 34′, that shows moving objects. Next, a global adaptive thresholding step 50 is applied on first edge image I1E. Unlike step 36 in case I, global thresholding 50 operates differently than the local thresholding in step 36 and does not include a region-growing procedure. The output of global adaptive thresholding step 50 is a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A method for automatic, stable and robust object extraction of moving objects in color video frames, achieved without any prior knowledge of the video content. For high rate video, the method includes providing at least a first and a second high frame rate video frames, performing a reciprocal illumination correction of the first and second video frames to yield respective first and second smoothed frames, performing a change detection operation between the first and second smoothed frames to obtain a difference image, and performing a local adaptive thresholding operation on the difference image to generate a binary image containing extracted objects, the local thresholding operation using a weight test to determine a boundary of each of the extracted objects. For an extracted object with a fragmented boundary, the method further comprises re-unifying the boundary. For low rate video, additional steps include: an edge correction applied on the first image to yield a first edge-corrected image, a global thresholding applied to the first edge-corrected image to yield a first binary edge image, and an ANDing operation on the difference image and the first binary edge image to generate a second binary image which is fed to the local adaptive thresholding operation.

Description

FIELD AND BACKGROUND OF THE INVENTION[0001]The term image segmentation refers to the partition of an image into a set of non-overlapping regions that cover it. An object is composed of one or more segments, and the term image segmentation is thus closely associated with “object extraction”. The definition of the latter being well known. Image segmentation is probably one of the most important low-level techniques in vision, since virtually any computer vision algorithm incorporates some sort of segmentation. In general, a segmentation is classified as groups of pixels that have common similarities. The properties of a good image segmentation are defined as follows: regions of segments in the image segmentation should be uniform and homogeneous with respect to some characteristic such as gray tone or texture. Region interiors should be simple and without many small holes. Adjacent regions should have significantly different values with respect to the characteristic on which they are ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/00G06T5/00G06V10/28
CPCG06K9/38G06T7/0083G06T7/0097G06T7/2006G06T2207/20064G06T2207/10016G06T2207/20012G06T7/12G06T7/174G06T7/215G06V10/28
Inventor AVERBUCH, AMIRMILLER, OFER
Owner F POSZAT HU
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products