Newsletter

Configuring Success, Minimizing Risk: Upstart DSP Architectures Aim at Elegance, Ease of Use





TechOnline

Design specs that increasingly demand high performance and low power consumption are forcing IC designers to explore relatively uncharted architectural territory. Dual-core designs and multi-million-gate FPGAs are unfamiliar enough for most design teams. Few options are more bewildering than configurable processors—especially when signal processing is involved.

It's clear that cranking up the clock rate on fixed-function, programmable DSPs can deliver virtually any performance goal. It's equally clear that low-power applications—virtually anything with mobile in its name, for example—need to run cool. Top-flight design teams can do a lot with aggressive power management—even with fixed architecture DSPs. But power management techniques such as sleep modes, clock gating, and dynamically changing clock speeds are reaching a point of diminishing returns. Moreover, implementing them intelligently introduces design complexity and verification headaches.

That's exactly the kind of scenario that makes designers and architects consider enduring the pain of adopting a new technology: designer-configured processors—and even processors that configure themselves at run time to execute specific algorithms.

Performance Breakthroughs
There's little doubt about the performance gains. When a design team takes the time to profile how specific tasks consume processor's time when it runs a complete application, it's not unusual in a digital-signal processing application to see 80% of a processor's cycles being expended in algorithm execution and 20% in control oriented functions.

Standard DSPs execute those algorithms as program code and—in some instances—allow the creation of special instructions that reduce the number of clock cycles to run the algorithm. When the chip is configured by the designer to execute the algorithm in hardware, however, performance gains can be impressive. Not much can be done about the 20% of the processor's time that is allotted to control functions. But if the signal-processing 80% could be cut by a factor of four, overall performance would be increased by 2.5X.

Benchmark results from the vendor-independent Embedded Microprocessor Benchmark Consortium's (EEMBC) Telecom suite, for example, show that as of early this year Improv Systems' designer-configurable Jazz Telecom XT processor leads the pack by a considerable distance compared to nine other DSP-oriented processors.

In fact, as a group designer-configurable DSPs typically led fixed-function, programmable architectures at comparable clock speeds. The benchmarks show how the various processors perform on a per cycle basis when optimized for the Telecom suite using assembly language and special instructions.

The figure of merit in this case is the Telemark, EEMBC's consolidated performance metric for the entire suite. The Telemark score makes sense only when compared to the performance of other processors running the same benchmark suite.

Measured by throughput per clock cycle, the Jazz processor is, for example, twelve times more efficient than the Texas Instruments' fixed-function TMS320C62 and six times more efficient than the Tensilica Xtensa T1050, another designer-configurable processor (Figure 1). In the comparison sited above, this means the optimized Improv processor was 6 to 12 times more efficient than its competitors—not that it boosted overall performance 6 to 12X.


Figure 1:  Throughput for EEMBC optimized telecom suite

Risk Aversion
But performance is not where the story ends. SoC design-team managers and vice presidents of engineering must consider the costs and risks involved in moving to a new architecture supported by a relatively new company. Jeff Bier, general manager of Berkeley Design Technology Inc., a Berkeley CA-based DSP consulting and benchmarking company, sees four general cost categories:

  • Custom-Chip Design
    Simply stated, there are economic reasons why most signal-processing-intensive designs are implemented on standard-part, fixed-function DSPs from vendors such as Texas Instruments, Analog Devices, and Motorola. Customizing a processor usually requires building a custom chip. The millions of dollars required for such designs eliminate most OEM companies from considering a customized processor, says Bier, because relatively few OEMs can achieve the volumes required to recoup this investment. Moreover, the application requirements must be sufficiently stable that the OEM can have confidence that the chip won't become obsolete before it's even fabricated. Processors that are configurable at run time avoid these issues, but present other challenges.

  • Software Development Tools
    The very fact that a configurable architecture can be changed increases the challenges of creating software development tools that are both stable and sophisticated. Even for well-established standard processors, says Bier, tools have difficulty keeping pace with architectural changes and user requirements. When vendors field a new architecture, they must pay a lot of attention to development tools.

  • Compatibility
    Porting legacy code to a new architecture can be both expensive and time consuming, says Bier. Legacy C code can be recompiled, of course, but the new architecture's compiler has to be very efficient if it hopes to maintain the performance gains that justify the migration to a new architecture in the first place. And critical parts of the legacy code written in assembly language must be rewritten. The task is not insurmountable but it can be daunting for OEMs that have grown very comfortable with established processor architecture.

  • Roadmap Risk
    Future product generations of any SoC design depend on the long-term viability of the processor architecture adopted for the design as well as the company behind it. Yet most—if not all—companies offering configurable architectures are small, startup companies with uncertain futures. Which of these companies, asks Bier, will survive? Which will be acquired? Which will shift their product strategies? Design team managers and vice presidents of engineering must consider the vendor's long-term prospects.

The Strongest Survive
The semiconductor industry's dynamism is driven in large part by new companies introducing new technologies and/or architectures. Such has been the case for companies from (alphabetically) ARM and its RISC architecture to Xilinx and FPGA technology with companies such as Intel (X86 architecture) in between.

So it is not surprising that with the industry at still another price/performance/power crossroads, entrepreneurial companies such as Tensilica, Improv Systems, 3DSP, Elixent, and Morpho Technologies—to name just a few—smell an opportunity to be the next ARM.

Nor is it surprising that household names in the semiconductor industry such as Motorola SPS, Philips Semiconductors, NEC, and ARM are either licensing the IP of companies with configurable DSP processors or investing in the companies themselves.

As we will see, each player has a different take on how the configurable DSP game should be played.

Tensilica's Xtensa
Tensilica offers designers a configurable, extensible, and synthesizable processor core specifically created for SoC designs. The Xtensa processor core is based on the premise that configurability can be achieved by replacing hardwired state machines with firmware controlled datapaths.

The heart of Tensilica's technology is a processor generator that creates custom processors along with compatible software, modeling, and EDA support. Optimized task processors are combined to handle specific computing tasks and because they are processors—not hardwired RTL logic—they can be programmed (Figure 2). For DSP-specific applications, Tensilica offers a set of Xtensa extensions called Vectra.


Figure 2:  Tensilica's Xtensa utilizes multiple task processors

Audio codecs used in GSM cell phones provide a classic example of utilizing a task processor. Analyzing the algorithm reveals that 80% of the processor's cycles are devoted to multiplications. Adding a hardware multiplier as a task processor yields a 7X improvement to execute the GSM audio codec code.

New instructions can also be added to turbo charge performance. For example, a designer can add a Viterbi butterfly instruction to the instruction set that includes corresponding pipeline hardware using Tensilica's TIE (Tensilica Instruction Extension) language.

From a base architecture perspective, Xtensa implements a five-stage pipeline with 32-bit standard register widths. The Xtensa instruction set delivers high code density with compact 16/24-bit native instruction coding. The clock runs at 200 MHz in the .18-µm process and 300-350 MHz in the .13-µm process.

Improv Systems' Jazz
Improv Systems' Jazz processor product offering is based on creating data-flow engines specifically for signal processing applications. The basic processor uses a VLIW (very long instruction word) architecture that has a number of degrees of freedom, says Vice President of Marketing Victor Berman.

The VLIW word can be narrow or wide—depending on the configuration of that specific processor—and the architecture makes available 16 VLIW "slots," all running in parallel. These can be throught of as accelerator units.

A small number of slots can be configured to create a low-power processor that handles a relatively light computing load. A large number can be configured into a formidable multiprocessor system. Power consumption is low in either case because the number of clock cycles to run the application is reduced.

Improv's impressive performance in the EEMBC benchmark resulted from profiling the Telecom suite as an application and creating a custom processor to run it. In the Improv architecture, hardware accelerator units look like instructions to the operating system.

Like Tensilica, Improv has paid a great deal of attention to its development tools, particularly its performance analysis tool. There's a heavy duality between software and hardware in signal processing applications, says Berman, and efficient data flow requires optimization of both.

In an iterative process, the results from an initial configuration are fed into the compiler which produces a new executable, says Berman, until the data flow engines are optimized.

A few other companies that have launched configurable processing architectures are 3DSP, Exilent, and Morpho Techologies. Traditional FPGA companies can also be included in the list because their architectures are run-time configurable. Designers that take this route must be prepared, however, for higher power consumption, reduced performance, and larger die size due to routing overhead.

Architectural Cornucopia
3DSP's SP20-UniPHY processor is tuned for physical-layer signal processing for applications such as wireless LANs, home networking, xDSL, and cable modems. Its defining architectural feature is SIMD (single instruction, multiple data). By utilizing up to 12 SIMD instruction units, it can execute 12 operations per clock cycle. It also features datapath registers to boost bandwidth and a technology called SoftDatapaths—configurable and software-programmable datapaths.

U.K.-based Exilent relies on a technology it calls Reconfigurable Algorithm Processing (RAP) to deliver run-time configurability. A switch-fabric array called D-Fabrix is created by combining two 4-bit ALUs, two registers, and two "switchboxes" into a basic building block or tile. Hundreds of thousands of tiles create the D-Fabrix array.

Algorithms can are mapped onto the hardware by selecting the right combination of tiles and describing the flow in VHDL, Verilog, or a higher-level language like Handel-C or Matlab. The tiles are connected by the switchbox network. The "virtual hardware" can be reprogrammed at any time by organizing the array with new code generated by Matlab, for example.

Risk versus Reward
A few quick visits to the Web sites of the companies mentioned so far supplies convincing evidence that configurable DSP technologies are being explored in earnest by leading-edge SoC designers. Reports of partnerships with large semiconductor houses, licensing deals and design wins are plentiful.

But there may soon be a middle road. Texas Instruments, the 800-pound gorilla of DSPs, for example, has noted that configurable processors are most likely to gain acceptance as accelerator blocks in platform SoCs—but TI has not made any "accelerator" announcements of its own yet.

Programmable DSPs remains TI's technology platform of choice both in its standard parts business (where there is no other choice) and for its OMAP SoC platform. Risks exceed rewards in TI's view. Given its strong market position, TI can also afford to wait and see which of the several configurable architectures succeeds, if any. On the other hand, its traditional DSP competitors—Motorola, Philips, Infineon, and others—look to partnerships with "configurable DSP" companies as the road to a breakthrough technology that could help them challenge TI's market dominance.

That makes the July 2003 acquisition by ARM of Belgium-based Adelante Technologies and its A|RT coprocessor development tool business particularly significant. Using A|RT, designers in the past could configure application-specific data engines to accelerate algorithm execution.

But the technology that will eventually be offered to ARM customers will not be that flexible. Instead, design teams will be able to license a coprocessor template architecture and the tools that will allow them to customize the template. This will put ARM in the designer-customizable accelerator business without the full spectrum of tool and verification headaches imposed by complete flexibility. ARM reportedly also plans to offer pre-configured data engines optimized for standard signal-processing functions such as inverse DCTs or Huffman algorithms. These would simply be programmable.

On the other hand, ARM's policy of cooperation with other IP vendors is still intact. The bottom line for a design team is that it could opt for a true dual processor architecture using ARM RISC and a core from any of a number of companies, including Improv and 3DSP, which have hooks to ARM's AMBA and AHB buses. Or, for lighter duty signal processing applications, the design team could go the acceleration route.

In any event, configurable processing as a technology is not likely to disappear entirely from the SoC design engineer's tool kit. Only the mode of utilization is in doubt.


About the Author
Contributing writer Jack Shandle is a former chief editor of both Electronic Design magazine and ChipCenter.com. He holds a BSEE degree and has written hundreds of articles on all aspects of the electronics OEM industry. Jack is president of eContentWorks, a consultancy that creates high-value content for publishers, eOEM corporations, and industry associations. His email address is jshandle@earthlink.net.



 







 Featured Jobs
Boeing seeking Embedded Software Engineer 5 in Huntington Beach, CA

SEL seeking Lead DSP Engineer in Pullman, WA

SEL seeking Power Systems Instructor in Pullman, WA

Rutland Regional Medical seeking Server Engineer in Rutland, VT

Osram Sylvania seeking Mechanical Design Engineer in Danvers, MA

More jobs on EETimesCareers
 Sponsor
 CAREER CENTER
Ready to take that job and shove it?
SEARCH JOBS:

 SPONSOR

 RECENT JOB POSTINGS
For more great jobs, career related news, features and services, please visit EETimes' Career Center.