ARM's general-purpose processor cores have long been used alongside DSP processors in products like cell phones, where the ARM core typically handles tasks like packet processing, user interface, and overall control, and the DSP handles the computationally demanding signal processing. But as ARM has gradually upgraded its cores with DSP-oriented features, more chip and system designers are considering whether to use an ARM core as a DSP engine. The question is, how much signal processing work can an ARM core handle?
In this article, we present our independent benchmark results for members of ARM's ARM11 core family, and look at these cores' performance on common DSP algorithms and on video decoding. We analyze and compare the ARM11's performance to that of earlier ARM cores and selected DSP processors.
ARM Moves Towards DSP
ARM cores are low-cost CPUs that are used in a huge range of embedded products. One of the key advantages to using ARM processors is that they are ubiquitous; lots of people know how to program them, and they have strong third-party support. The earliest ARM core, the ARM7, was poorly suited to signal processing; it didn't have a single-cycle multiplier, and it was based on a Von Neumann memory architecture that didn't allow data and instructions to be retrieved simultaneously. As a result, its signal processing performance was poor relative to DSP processors. This wasn't surprising, since the ARM7 wasn't designed to handle signal processing; it was designed as a pure CPU. But it turns out that people have used the ARM7 for simple signal processing anyway. That's because it's readily available, it's already in products, and sometimes it's just easier to implement signal processing on the processor you've already got than to add a new one. Of course, using an ARM7 for signal processing makes sense only for applications with relatively low computational demands; the ARM7 just isn't powerful enough to handle even moderately demanding signal processing.
As signal processing has become increasingly important in embedded applications, ARM has responded by enhancing its architectures with DSP-oriented features. The ARM9 and ARM9E, for example, both incorporate architectural features that, along with their higher clock speeds, help to improve their signal processing capabilities relative to the ARM7—though these processors still offer only modest signal processing performance.
The ARM11, one of ARM's newer core families, represents a significant upgrade over earlier ARM architectures in terms of signal processing features. Of particular interest are the new 8-bit and 16-bit SIMD (single instruction, multiple data) instructions, which are intended to accelerate video and audio processing. (ARM refers to these instructions as media processing extensions.) The ARM11 includes a 64-bit data bus (vs. 32 bits on earlier ARM processors) to help make use of these additional computational capabilities. The ARM11 also has a deeper pipeline, enabling higher clock rates. As we explain in more detail below, all of these factors combine to give the ARM11 significantly better signal processing performance than earlier ARM cores.
Assessing the ARM11
BDTI has assessed the digital signal processing performance of the ARM11 using two of its highly respected benchmark suites. The first results we'll present are for the BDTI DSP Kernel Benchmarks, which consist of 12 common DSP algorithms (such as FIR filters and FFTs). Each algorithm is carefully optimized on each target processor, mirroring how such functions are typically implemented in signal processing applications. A processor's results from BDTI's 12 kernel benchmarks are used to evaluate its speed, energy efficiency, memory efficiency, and cost performance. These results are also used to generate the processor's BDTImark2000™ score. The BDTImark2000 is an overall DSP speed metric, with a higher score indicating a faster processor. Figure 1 illustrates the ARM1176 and ARM1136 results on the BDTImark2000, alongside those of several other processors. (BDTImark2000 scores for additional processors are available on BDTI's web site, at http://www.bdti.com/bdtimark/BDTImark2000.htm.)

Figure 1: Certified BDTImark2000&trade results. The BDTImark2000 is an overall measure of processors' signal processing speed, based on the BDTI DSP Kernel Benchmarks. BDTIsimMark2000 scores are measured on simulators rather than on hardware, and may use projected clock speeds.
As shown in Figure 1, the architectural changes and higher clock speed make the ARM11 cores significantly faster than the ARM9E, and slightly faster than the MIPS24KEc (a DSP-enhanced MIPS core). The ARM11 isn't the fastest core shown here; not surprisingly, the CEVA-X1620—which is a high-performance DSP core—is much faster. The ARM11 is, however, within striking range of the speed of TI's low-cost, low-power DSP architecture, the TMS320C55x, which is commonly used in applications like cellular telephones. And more generally, the ARM11 is fast enough that it's possible to use it as a stand-alone signal processing engine for moderately demanding applications.
The ARM11 has to run at a faster clock rate to achieve similar signal processing performance to the TI TMS320C55x, which suggests that the ARM11's energy efficiency will be lower. And to match the speed of the CEVA DSP core, the ARM11 would have to run at roughly twice the CEVA's clock speed. Figure 2 shows the relative DSP energy efficiency for the ARM11 cores alongside that of a typical high-performance DSP core.

Figure 2: DSP energy efficiency, based on BDTI's DSP Kernel Benchmark results.
There are two sides to the benchmark results shown in Figure 1 and Figure 2. In ARM's favor, it's clear that the ARM11 cores have sufficient signal processing performance for a significant range of applications. On the other hand, using a dedicated DSP core can yield higher digital signal processing performance and superior energy efficiency compared to running signal processing tasks on an ARM11. Of course, factors other than performance and efficiency often play a central role in processor selection decisions. For example, as we described earlier, it's often the case that a CPU is required for non-signal-processing functions (as it is in a cell phone) and it may be desirable to recruit that processor to run the required signal processing tasks rather than adding a separate DSP processor: Using one core instead of two means a single software development environment, simpler system design and possibly a simpler programming model. (For many of the same reasons, it's sometimes desirable to replace a DSP core with a second CPU.) The BDTI DSP Kernel Benchmarks results indicate that an ARM11 core can play that dual role in some applications, since its overall signal processing speed is similar to that of low-cost DSP processors.
|