[Part 1 looks at error concealment strategies and error resilient coding for waveform and CELP speech codecs. Part 2 examines loss concealment techniques for overlapped transform based codecs.]
3.5 FORWARD ERROR CORRECTION TECHNIQUES FOR SPEECH
In the previous sections, we discussed several error concealment techniques, targeted at alleviating the
consequences of packet losses. Some of these techniques are reasonably effective and will provide quite adequate speech quality, especially at low loss rates. Nevertheless, as the loss rates increase, concealment becomes increasingly hard and is prone to leave a number of artifacts. For this reason, Forward Error Correction (FEC) is often used - either in isolation or as a complementary measure - against packet losses.
FEC techniques can range from simple packet replication techniques to more elaborate schemes, including media-dependent FEC. In this section, we discuss media-dependent FEC and present a framework for optimum rate distortion bit allocation. We will also present a case study based on the AMR-WB codec [6]. More general FEC methods can be found in Chapters 7 and 9.
3.5.1 Delay and FEC
Generally speaking, FEC schemes allow the receiver to correctly decode a message, even if some of the packets are lost. This is done by adding redundant information to the stream. The information can be included in a separate packet, or appended to existing packets. For example, one could send a parity packet after every three data packets, as illustrated in Figure 3.6.
FIGURE 3.6: FEC example with a 4:3 redundancy. Each fourth block is an XOR of the previous three blocks.
In this scheme, if one of the three packets is lost, one can use the parity packet to recover the original information without loss. This increase in robustness is useful, but it also increases the bandwidth requirement by 33% (by sending one extra packet for every three original packets). Furthermore, there is also a delay cost: if the first of the three packets is lost, the receiver has to wait until receiving the parity packet before decoding the lost packet. In this example, this would add an extra two-frame delay. Partially to reduce this added delay, most FEC schemes for real-time communication simply repeat the packet.
More information about standard FEC techniques will be discussed in Chapters 7 and 9. But for now, let's simply mention that using an FEC code that spreads over N blocks will essentially add up to N blocks delay. For this reason it is highly desirable for FEC codes to spread the smallest number of blocks possible.
3.5.2 Media-Dependent FEC
As we mentioned, it is desirable that the FEC technique introduces as little extra delay as possible. Ideally, we would like FEC codes that spread only a single block. Unfortunately, under the traditional FEC techniques, the only such "code" available is packet repetition. That happens because traditional FEC try to protect the bits of the message. When one is sending media, protecting individual bits is not as important anymore, but instead, the idea is to protect the signal. In other words, a rate"distortion trade-off can now be applied. Looked at from this point of view, packet repetition is clearly suboptimal. For example, in a 10% loss scenario, the error correction information is only used 10% of the time and yet uses the same rate as the primary packet.
In traditional FEC codes, the sender inserts bit redundancy in the transmitted packets, and the receiver will either perfectly receive the frame or receive nothing. There is no rate"distortion trade-off. In media-dependent FEC methods, in contrast, the transmitter sends multiple descriptions of the same frame so that in case of packet loss, another packet containing the same data, albeit different quality, can be used to recover the loss. Hence, each packet will carry an appropriate representation of the current frame, along with a coarse representation of one or more previous frames.
Clearly, there is a trade-off between attributing rate to redundant information instead of to the current frame. By increasing the amount of redundant information, we increase the probability and the quality of loss recovery while sacrificing from the quality of the most recent frame.
An example of such media-dependent FEC schemes is the one presented in [17]. Earlier work includes the Robust Audio Tool [18], which limits the repeat packet to be the same as the original one. The problem can be formulated as follows. Given a model for the channel and a total transmission rate R (i.e., fixed packet size), what is the optimum partition of the bit budget between redundant and current frames such that a distortion measure DT is minimized?
We consider each frame as a signal segment and each packet may contain information units regarding one or more frames. The units can contain raw data or a representation of data derived by some compression algorithm (e.g., LPC coefficients, prediction errors). We model each packet as a collection of multiple units corresponding to different segments of the signal, each possibly having a different rate. For each packet, r1 is the rate of the present segment and ri is the rate of (i - 1)th past segment. The number of these units and the rate of each unit can be either fixed by the optimization algorithm prior to transmission or adaptively changed based on the input signal. Figure 3.7 shows an example, with four consecutive packets, with each packet carrying information about the current frame, as well as lower fidelity information about the two previous packets.
FIGURE 3.7: Media-aware FEC example with a factor of 3 redundancy. The current block carries the frame with full resolution and previous blocks with decreasing degree of accuracy.
Another point of interest is whether each unit is dependent on previous units (i.e., differential coding). We will analyze here the case in which each segment of data is processed independently. This would be the case, for example, of encoding video with all I-frames or encoding speech using G.722.1 ("Siren") or G.711 (PCM). The case of history-dependent algorithms, where each segment is sent as a unit, is handled in detail in [19].