Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Efficient video skimmer

a video skimmer and efficient technology, applied in the field of efficient video skimmer, to achieve the effect of reducing spatial-temporal resolution, reducing bandwidth consumption, and reducing resolution

Inactive Publication Date: 2010-10-28
VIDYO
View PDF50 Cites 34 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

[0052]According to the invention, the video may be compressed using a layered codec, such as the one disclosed in ITU-T Recommendation H.264 Annex G (also known as SVC). In order to take full advantage of the invention, the scalable video bitstream that is stored, among other things, in the full length video file, should contain at least one low resolution version of the video content, advantageously as a base layer. The low resolution can be stored in the form of a base layer and one or more enhancement layers; however, the mentioned combination of base and / or enhancement layers, after decoding, still results in the low resolution. The resolution can be chosen such that it is suitable, after decoding, for displaying in a mini browsing window (MBW) of the video skimmer display. An MBW can be smaller in spatial size than a full window, which can be optimized to view the full resolution video. Full resolution video may be obtained by decoding a base layer and at least one enhancement layer more than required for the lower resolution. The sizes of the full window and any MBWs can be chosen by the user according to his / her user preferences. The system can include a user interface that can display many MBWs, and each MBW can display a specific video chapter of the full length video. The user interface can also allow the user to set his / her user preferences, for example, number of MBWs, size of each MBW, start time or duration of each video chapter, assignment of chapters to MBWs, and so forth.
[0053]The term “codec” is equally used herein to describe techniques for encoding and decoding, and for implementations of these techniques. An encoder converts input media data into a bitstream or a packet stream, and a decoder converts an input bitstream or packet stream into a media representation suitable for presentation to a user, for example digital or analog video ready for presentation through a monitor, or digital or analog audio ready for presentation through loudspeakers. A transcoder converts an input bitstream or packet stream compressed using a compression technique into its original media representation suitable for presentation to a user and then re-converts into an input bitstream or packet stream using another type of compression technique. Encoders and decoders can be dedicated hardware devices or building blocks of a software-based implementation running on a general purpose CPU.
[0054]Set-top-boxes and personal computers (PCs) can be built such that many encoders or decoders may run in parallel or quasi-parallel. For hardware encoders or decoders, one way to support multiple encoders / decoders is to integrate multiple of their instances in the set-top-box or PC. For software implementations, similar mechanisms can be employed.
[0055]Traditional video codecs used in video distribution systems provide only a single bit stream at a given bitrate, and no layers. As explained above, when a lower temporal or spatial resolution is required from a full length video file (such as for fast forwarding or for display at a smaller spatial size in a MBW), first, the full resolution file must be decoded to regenerate the raw (uncompressed) video, which then needs to be sub-sampled in temporal and / or spatial dimension, as the case may be, to produce a lower spatio-temporal resolution appropriate for the MBW. This process wastes significant bandwidth (if the full length video file is in a remote location and needs to be transported over a network), time, and computational resources. However, support for lower resolutions is beneficial in the video skimmer to enable display of many video chapters simultaneously, and without consuming processing time and power to generate them. The network bandwidth required to transport video for many MBWs may also be advantageously minimized.
[0056]In one embodiment, a skimmer may support “spatial skimming”. A full length video file available in a layered encoded format may readily carry a low resolution version of the actual video content, which may fit into MBWs of the video skimmer system without further spatial sub-sampling after decoding. The skimmer may simultaneously display more than one MBW showing more than one chapter. The user may enlarge the video of a chapter by clicking on the MBW once he / she identifies the scene of interest in an MBW. As a result, the skimmer can request and receive information that enables the skimmer to present to the user a high resolution version of the video content, as disclosed in the co-pending U.S. patent application entitled “Systems, Methods and Computer Readable Media for Instant Multi-Channel Video Content Browsing in digital Video Distribution Systems”, concurrently filed herewith.
[0057]In the same or another embodiment, a video skimmer can support temporal skimming. A full length video file available in a layered encoded format may readily carry a temporally sub-sampled lower layer. The skimmer may disregard the timing information in the lower layer and present the video as fast forward video. For example, if the full length video were originally available at 30 fps, and the temporally sub-sampled lower layer is available at 10 fps, the skimmer may display the 10 fps lower layer at 30 fps, thereby speeding up playback at a factor of 3. Once the user clicks on the MBW presenting the fast forward video, the skimmer may display the MBW's content in original speed (in the example, by slowing down playback speed to 10 fps). It may further request and receive temporal scalable enhancement layers that enable full temporal resolution of the MBW's content.

Problems solved by technology

Currently, these videos are often of small size and low resolution.
Even after applying digital video compression techniques, high resolution video results in large file sizes.
“Skimming” video, alternatively known as browsing, has been a technical challenge for a long time.
a. Fast forwarding: fast forwarding (also known as increasing the video playback speed) shortens the video viewing time. However, speeding up the video rate distorts the video information and may cause elimination of short events. This method has been the most popular browsing technique so far. Fast forwarding is discussed in more detail below.
b. Text Based Queries: This refers to a querying of metadata associated with the full length video or video chapters for specific textual information. For example, a text based query may be in the form of “scene with George falling off the bridge”. Text based queries today require the video to be annotated, mostly a manual process, before the video can be queried. Although text-based video query has been in existence for a long time, only few applications can afford the required intense human effort needed to intelligently categorize and annotate the videos. One example of video content that contains metadata which enables text based queries is medical records used in some systems.
c. Automatic Indexing: In the academic literature [for example, Cees G. M. Snoek and Marcel Worring, “Multimodal Video Indexing: A Review of the State-of-the-art,” Multimedia Tools and Applications, Volume 25, Number 1 / January, 2005, Springer], techniques have been proposed to automatically index video for browsing representations based on information within the video. These indexing systems can use, for example, any of the following information aspects to generate video chapters:Motion of the video;Scene changes;Image statistics—such as color and shape;Audio information; and / orSpecific object types in the video.
Today, when using any form of automated video skim generation, it is unfortunately quite frequent that a certain scene, in which a user may be interested, stays unidentified by the skimming process.
In summary, the automated context-sensitive generation of video skims, despite the significant research conducted over the past decade, has remained a task that is difficult, requiring high computational complexity and involving human interaction such as filtering and processing.
(1) The search may still take a long time depending on where the specific video segment of interest is located within the full length video sequence (particularly if it is located towards the end).
(2) The video segment of interest may be made unnoticeable or totally lost during sub-sampling as it may fall on the deleted frames (especially when large sub-sampling intervals are in use).
(3) The associated audio information, if any, often cannot be meaningfully presented.
This process may require “spatial sub-sampling” to reduce the resolution of the original video to fit into smaller windows, because of display size limitations as illustrated in FIG. 2.
(1) Performing spatial sub-sampling in real-time to generate smaller versions of the full-length video is computationally intensive and time consuming. Depending on how many windows are generated and the size of each window, the sub-sampling may require significant computing resources.
(2) The information may be lost during sub-sampling due to side effects of spatial sub-sampling such as filtering or aliasing.
However, the compressed video file can't be temporally sub-sampled randomly as the sequence of compressed frames may depend on other frames due to inter-picture prediction.
If there are no IDR frames or if their frequency is low, then fast forwarding will not be feasible without decoding a large percentage of the coded pictures of the full length video sequence.
(1) It may be time consuming and / or computationally expensive.
(2) With an increase of the number of IDR frames, the compression ratio decreases. The transcoded full length sequence with a higher number of IDR frames may be significantly larger than the original compressed full length sequence.
(3) The disadvantages of fast forwarding with raw files still remain.
The process of sub-dividing a compressed file suffers from similar disadvantages as temporal sub-sampling.
Further using traditional video compression technologies, spatial sub-sampling is not possible in the compressed domain.
Moreover, although use of compressed video file eliminates the disadvantage of storing a large file, the need to decode the file several times in real-time introduces significant additional cost and processing complexity to spatial re-sampling.
If the full length video file, be it in raw or compressed format, is not available locally, the problem of video skimming according to the described techniques is further exacerbated by the need to retrieve it in real-time to a local computer over a network like the public Internet.
Particularly, if the file is in raw format, then the bandwidth requirements are impractically large (i.e., 45 Mbps for a reasonable speed download of an SDTV resolution sequence).
Accordingly, given the issues of using raw and compressed video, and using temporal and spatial sub-sampling, there has not been an acceptable implementation of a practical real-time video skimmer in the market place.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Efficient video skimmer
  • Efficient video skimmer
  • Efficient video skimmer

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0061]FIG. 4 shows a standalone video skimmer (401) with an attached display (402). The video skimmer can receive video content from a variety of sources: live feed video content, for example from a camera (403) connected to the video skimmer through interface (404); video content from a DVD (405) attached to the video skimmer through interface (406); or even in the form of a full length video file from an external and / or remote video database (407). The external remote video database can be located on the Internet (409) or other suitable networks, using network interfaces (410, 411). The MBWs are presented on display 402 (as depicted in FIG. 3). The video skimmer logic is part of the video skimmer (401). The video skimmer can be implemented based on a general purpose computer, e.g., a PC, a standalone computer or some other type of hardware, such as a set-top box in IPTV environment where the set-top box may be attached to a suitable network (409) such as the Internet.

[0062]In case...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Disclosed are a system, method, apparatus, and computer readable media containing instructions for displaying video files for rapid searching. In two different types of exemplary embodiments, a standalone video skimming system, and a video skimming system includes a server and a client system are disclosed, where the video file may be locally or remotely stored, or can be obtained from a live feed. The system displays many small windows simultaneously, in which different parts of the video chosen by the user are shown at the same time to shorten the skimming time. The video file is encoded using layered encoding to display smaller versions using lower layers, and without needing any processing to generate smaller versions of the video from the original full screen version. A video extractor is described for extracting the necessary bitstreams from a local video database containing layered encoded video files according to user specified window sizes, and distributing the signals over the electronic communications network channel. The system also includes a skimming control logic which can receive control commands from clients and invoke the video extractor to extract appropriate audio-visual signals there from for each command.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS[0001]This application claims the benefit of priority to U.S. Provisional Application Ser. No. 61,172,355, filed Apr. 24, 2009, which is hereby incorporated by reference herein in its entirety.BACKGROUND[0002]1. Technical Field[0003]The disclosed invention relates to techniques for searching for content in a compressed digital video file accessed from local storage or over a network such as the Internet. In particular, it relates to the use of layered video coding technology in connection with content searching for retrieving and displaying selected video segments.[0004]2. Background Art[0005]Subject matter related to the present application can be found in co-pending U.S. patent application Ser. Nos. 12 / 015,956, filed Jan. 17, 2008 and entitled “System And Method For Scalable And Low-Delay Videoconferencing Using Scalable Video Coding,” 11 / 608,776, filed Dec. 8, 2006 and entitled “Systems And Methods For Error Resilience And Random Access In V...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): H04N5/14H04N7/26
CPCG11B27/105H04N21/64322H04N5/772H04N5/775H04N5/783H04N5/85H04N9/8042H04N9/8205H04N21/234327H04N21/234363H04N21/234381H04N21/431H04N21/4314H04N21/4383H04N21/4384H04N21/440227H04N21/4621H04N21/8549H04N19/61H04N19/162H04N19/33H04N19/31H04N19/40H04N5/765
Inventor CIVANLAR, MEHMET REHASHAPIRO, OFERSHALOM, TAL
Owner VIDYO
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products