Newsletter

DSP DesignLine  >  Design Center

Video codecs, part 5: Scalable video codec (SVC)

Part 5 introduces scalable video coders (SVC) and explores motion compensated temporal filtering (MCTF), including a discussion of detecting covered pixels and bi-directional MCTF.

Page 1 of 4

DSP DesignLine

This series is excerpted from "Multidimensional Signal, Image, and Video Processing and Coding." Order this book today at www.elsevierdirect.com or by calling 1-800-545-2522 and receive an additional 20% discount and free shipping. Use promotion code 92004 when ordering. Valid only in North America.

Part 4 looks at Wavelet codecs.


11.5 Scalable Video Coders
In many applications of video compression, it is not known to the encoder what resolutions, frame rates, and/or qualities (bitrates) are needed at the receivers. Further, in a streaming or broadcast application, different values of these key video parameters may be needed across the communication paths or links. Scalable coders [47] have been advanced to solve this problem without the need for complicated and lossy transcoding.

Definition 11.5-1 (scalability)
One coded video bitstream that can be efficiently decomposed for use at many spatial resolutions (image sizes), frame rates, regions of interest, and bitrates.

A scalable bitstream that has all four types of scalabilities will be referred to as fully scalable. We can think of region-of-interest capability as a restricted type of object-based scalability that can, together with resolution scalability, support zoom and pan functionality. This capability can support browsing, wherein a small version of a high-resolution image may be "zoomed and panned," to locate interesting parts for closer looks at higher resolution.

Scalability is needed in several areas, including digital television, heterogeneous computer networks, database browsing, to match various display formats (to adjust to window size on screen), to match various display frame rates, to deal with loading of a video on demand (VoD) server, etc. Scalability in database browsing facilitates efficient pyramid search of image and video databases. Scalability on heterogeneous networks can enable a network manager to do dynamic load control to match link capacities as well as terminal computer capabilities. A scalable encoder can also better match a variable carrier-to-noise ratio (CNR) on wireless channels.

The standard coders H.26x and MPEGx have a limited amount of scalability, usually just one or two levels. For spatial scalability, MPEG 2 uses a pyramid coding method, where the base layer is coded conventionally, and then an enhancement layer supplements this base layer for higher resolution, frame rate, or quality. Figure 11-26 shows a coder targeted for spatial scalability in HD and SD television.


Figure 11-26. Illustration of resolution scalability for SD and HD video using MPEG 2.

Separate spatial and temporal scalability profiles, with two or three levels only, were standardized in MPEG 2, but are seldom used in practice. The limitations of these scalable profiles are lack of data conservation, limited range of scalability, coding errors not limited to the baseband, and drift for frequency scalable coder (reportedly building up to 7 dB over 12 frames [20]).

In the research area, Burt and Adelson [3] introduced a scalable pyramid coder with the desirable quantizer-in-loop property that can frequency shape (rolloff) lower resolution layers. Unfortunately, due to use of a Gaussian–Laplacian lowpass pyramid, there is lack of data conservation. Also due to the pyramidbased coding-and-recoding structure, the coding noise and artifacts spread from low (baseband) to higher spatial frequencies. Naveen's multiresolution SWT coder [28, 29] as sketched in Figure 11-23 [see Part 4], is scalable in resolution with three spatial levels. There is no drift at lower resolution because the same motion information is used at the encoder as at the decoder. It uses efficient hierarchical MC that can incorporate rate constraints and frequency roll-off to make the lower resolution frames more video-like, i.e., reduce their spatial high frequencies. Table 11-4 gives some average luma PSNR values for HD test clips MIT and Surfside, obtained using forward MC.


Table 11-4. PSNR results—forward motion compensation.

The Taubman and Zakhor [41] MCTF algorithm featured global motion compensation, layered (embedded) quantization, and conditional zero coding to facilitate SNR scalability. The published PSNR results for two common CIF test clips at 1.5 Mbps, coding only the luma (Y) component, are shown in Table 11-5. Results were good on panning motion clips, but less efficient on clips with detailed local motion. Also provided in the table is an MPEG 1 result for comparison.

Page 2: 11.5.1 More on MCTF  

Page 1 | 2 | 3 | 4



Rate this article
WORSE | BETTER
1 2 3 4 5




Related Content

TECH PAPER
1. Video and Vision Solutions Guide

WEBINAR
2. How to choose the right semiconductor solution for your flexible, low-power ultrasound system design

TECH PAPER
3. RapidIO Gen2 Technology

TECH PAPER
4. Exploring video compression challenges

 


 Featured Jobs
Ascension Health seeking Solutions Development Analyst in St. Louis, MO

National Semiconductor seeking Principal IC Design Engineer in Santa Clara, CA

Taylor Guitars seeking Sr. Web Designer in El Cajon, CA

Covidien seeking Hardware Manager in Boulder, CO

Sierra Nevada seeking Software Engineer in Hagerstown, MD

More jobs on EETimesCareers
 Sponsor
 CAREER CENTER
Ready to take that job and shove it?
SEARCH JOBS:

 SPONSOR

 RECENT JOB POSTINGS
For more great jobs, career related news, features and services, please visit EETimes' Career Center.