Newsletter

DSP DesignLine  >  Design Center

C-based coprocessor design, part 1: SIMD architecture

Here's how CebaTech's C2R C-to-RTL compiler was used to implement a G723.1 and G729.A speech coding accelerator. The accelerator features configurable micro-architecture and instruction-set architecture.

Page 1 of 3

DSP DesignLine

Part 2 shows how the C2R C-to-RTL compiler was used to customize and validate the datapath.

Programmable architectures, including micro-coded data-parallel accelerators, are the backbone processing engines in high performance ASICs. Traditionally, such architectures have been implemented at register transfer level (RTL), as this level of abstraction is sufficiently close to the actual hardware architecture and is fully supported by the mainstream ASIC and FPGA synthesis flows.

With the introduction of disruptive electronic system level (ESL) synthesis tools such as CebaTech Inc.'s C2R Compiler, large scale accelerators can be described at a higher abstraction level. At the same time, the processor architect maintains full control over the ESL synthesis process by using advanced features such as precise interface inference, user-specified clocking, explicit data (DLP) level and thread (TLP) level parallelism as well as combinatorial logic.

This article elaborates on the use of the C2R compiler for implementing a 2-way LIW/SIMD hybrid accelerator, attached to a scalar processor core, with configurable micro-architecture and programmer's model/ISA. The accelerator was designed for the ITU-T G723.1 and G729.A speech coding standards.

Introduction
Embedded processor cores with a fixed instruction set architecture (ISA) have been widely used in the design of SoC-based embedded systems in the past. Such architectures present a good compromise for the execution of general-purpose code, such as that for user interfaces, protocol processing and embedded operating systems. However, they lacked considerably in the area of digital signal processing (DSP), which is needed by almost all of the core algorithms of consumer targeted SoCs.

To increase the signal processing capability of such systems, architects have utilized a number of additional embedded DSP cores, in parallel to the main scalar processor core, to accelerate the performance of the time-critical loops of the application. This acceleration comes at the expense of silicon area and creates a convoluted programming model due to the multiple address spaces, ISAs, 'mailbox-type' communications and programmer-specified, DMA-based block data movement. A possible solution to these issues is to hardwire the core DSP functionality of the consumer application. However this involves the development and validation of thousands of lines of parallel code at RTL and results in solutions that, although of high performance, are only tuned to the task at hand and offer little or no programmability. The latter is a serious deficiency in both the consumer electronics and telecoms domains, which are characterized by short time to market, limited market windows and ever-evolving standards.

Over the past few years, a promising processing paradigm has been increasingly utilized in such high performance SoCs. This paradigm comprises configurable, extensible processors that allow the extension of their architecture (programmers model and ISA), and microarchitecture (execution units, streaming engines, custom coprocessors) by the system architect [1]. Configurable and extensible processors offer, on top of very high performance, the added advantage of post-fabrication adaptivity to evolving standards through the careful choice of the custom ISA and execution/ storage resources. Such diverse applications range from traditional consumer applications such as video coding [2] [3] and audio processing [4] to less obvious domains such as RTOS acceleration [5].

A third proposition for the modelling--and to a lesser extent, implementation--of high performance SoCs comes from a number of vendors, in the form of co-design environments and RTL synthesis systems for ESL design languages such as SystemC [6]. This paradigm presents an interesting prospect for designing and modelling the consumer ASIC in a parallel language and, in the process, creating an executable specification for high-speed validation as well as for final implementation. It is possible, for example, to extend the SystemC concept by specifying an object-oriented system level design and implementing the flow based on the transformation of UML to SystemC [7]; or to use SystemC at the transaction level in a co-design flow to model complex SoCs [8]. Similarly, NetC has provided a means of modelling and evolving networks-on-chip while producing cycle-accurate models in SystemC, whereas SystemC has been utilized [10] as both a modelling language and an implementation medium for a high-performance network-on-chip architecture.

It is only very recently that the introduction of powerful, system level behavioural synthesis technology [11] has enabled a full, untimed C-based flow to be used directly to transform complex application sources into hardwired silicon, without the need for a single- or multi-core programmable platform. This work utilizes CebaTech's untimed ANSI C to RTL flow, along with a more traditional RTL codebase, to describe a configurable, 2-way Long Instruction Word (LIW) architecture that can adapt easily to the data and instruction parallelism typically in the consumer electronics and telecoms area. This article describes the motivation behind the LE2 engine, the default macro- and micro-architecture and the ESL implementation of the processor. A particularly important feature of the LE2 is the ability to target diverse application domains by customizing the extended vector ISA. This customization is achieved with the new concept of 'plug-in vector datapaths'; A vector datapath used for the acceleration of the G723.1 and G729.A speech coders will be presented and discussed.

The novelty of this work relies in the fusion of the configurable processor and ESL implementation domains in a unique way by using ESL as the implementation medium not only of custom SIMD extensions [12], but of a whole parallel coprocessor.

Page 2: LE2 ISA and Programmers' Model  

Page 1 | 2 | 3



Rate this article
WORSE | BETTER
1 2 3 4 5




CebaTech
Related Content

COURSE
1. Hands-on Training with the New TMS320VC5505 eZdsp USB Stick Development Tool

COURSE
2. Low power and high precision with new TMS320C674x DSPs

COURSE
3. Enabling a Ubiquitous Video Infrastructure

COURSE
4. TMS320DM365 Digital Media Processor Demonstration

 


 Featured Jobs
Accenture seeking Project Management Team Lead in Charlotte, NC

Accenture seeking Software Engineer in Salt Lake City, UT

Boeing Company seeking Software Engineer in Herndon, VA

Switch and Data seeking Customer Solutions Engineer in Dallas, TX

Chart Industries seeking Sr. Developer in Cleveland, OH

More jobs on EETimesCareers
 Sponsor
 CAREER CENTER
Ready to take that job and shove it?
SEARCH JOBS:

 SPONSOR

 RECENT JOB POSTINGS
For more great jobs, career related news, features and services, please visit EETimes' Career Center.