|
[Part 2 looks at the basics of digital video. For the accompanying intro to audio, see Fundamentals of embedded audio.]
As consumers, we're intimately familiar with video systems in many embodiments. However, from the embedded developer's viewpoint, video represents a tangled web of different resolutions, formats, standards, sources and displays.
In this series, we will strive to untangle some of this intricate web, focusing on the most common circumstances you're likely to face in today's media processing systems. After reviewing the basics of video, we will discuss some common scenarios you may encounter in embedded video design and provide some tips and tricks for dealing with challenging video design issues.
Human Visual Perception
Let's start by discussing a little physiology. As we'll see, understanding how our eyes work has paved an important path in the evolution of video and imaging.
Our eyes contain 2 types of vision cells: rods and cones. Rods are primarily sensitive to light intensity as opposed to color, and they give us night vision capability. Cones, on the other hand, are not tuned to intensity, but instead are sensitive to wavelengths of light between 400nm(violet) and 770nm(red). Thus, the cones provide the foundation for our color perception.
There are 3 types of cones, each with a different pigment that's most sensitive to either red, green or blue energy, although there's a lot of overlap between the three responses. Taken together, the response of our cones peaks in the green region, at around 555 nm. This is why, as we'll see, we can make compromises in LCD displays by assigning the Green channel more bits of resolution than the Red or Blue channels.
The discovery of the Red, Green and Blue cones ties into the development of the trichromatic color theory, which states that almost any color of light can be conveyed by combining proportions of monochromatic Red, Green and Blue wavelengths.
Because our eyes have a lot more rods than cones, they are more sensitive to intensity than color. This allows us to save bandwidth in video and image representations by subsampling the color information.
Our perception of brightness is logarithmic, not linear. In other words, the actual intensity required to produce a 50% gray image (exactly between total black and total white) is only around 18% of the intensity we need to produce total white. This characteristic is extremely important in camera sensor and display technology, as we'll see in our discussion of gamma correction. Also, this effect leads to a reduced sensitivity to quantization distortion at high intensities, a trait that many media encoding algorithms use to their advantage.
Another visual novelty is that our eyes continually adjust to the viewing environment, creating their own reference for white, even in low-lighting or artificial-lighting situations. Because camera sensors don't innately act the same way, this gives rise to a white balance control in which the camera picks its reference point for absolute white.
Perhaps most important for image and video codecs, the eye is less sensitive to high-frequency information than low-frequency information. What's more, although it can detect fine details and color resolution in still images, it cannot do so for rapidly moving images. As a result, transform coding (DCT, FFT, etc.) and low-pass filtering can be used to reduce the total bandwidth needed to represent an image or video sequence.
Our eyes can notice a "flicker" effect at image update rates less than 50-60 times per second, or 50-60 Hz, in bright light. Under dim lighting conditions, this rate drops to about 24 Hz. Additionally, we tend to notice flicker in large uniform regions more so than in localized areas. These traits have important implications for interlaced video, refresh rates and display technologies.
|