|
Many emerging applications require extremely low-power DSPs. This article shows how to design such a DSP, using an electrocardiogram application as an example. We show how to achieve low power by tuning the algorithm, processor architecture, and memory system, as well as through clock gating. Throughout the article we present detailed power results to demonstrate the impact of each optimization.
Introduction
A new generation of biomedical monitoring devices is emerging. These applications are typically powered by a tiny battery or an energy scavenger, and have extremely low power budgets. Typical power budgets are around 100 μW for the whole system, including radio processing, data processing and memories.
To reduce power dissipation of the radio transmitter, system designers often employ feature extraction and/or data compression to reduce the number of bits transmitted. This shifts the power bottleneck from the radio to the data processor, which is the focus of our article. The goal of our work is to create a C-programmable, application-specific DSP optimized for low power. We use a reconfigurable processor from Philips' technology incubator Silicon Hive [4] as starting point.
The Technology
The processor used in our work is programmed in C using a retargetable compiler. Programmability, as opposed to fixed-function hardware, is important because the digital subsystem must be able to run different algorithms, such as switching between ECG or EEG analysis. The system may also need to run new algorithms from the biomedical domain. Programmable platforms also require less development effort and results in more portable code, i.e., code that can be recompiled for other hardware platforms with minimal effort.
In this article we differentiate between dynamic and static power consumption. Dynamic power is the power consumed by switching nodes; the power used inside cells due to short-circuits; and all power consumed by internal nets. This includes the functional units, memories, controller and clock.
Static power is the leakage power lost whether the circuit is active or is idle. Current CMOS technology trends indicate that leakage is becoming more dominant with every new process generation. In our experiments, leakage power was a critical factor—we measured up to 100 μW of leakage.
Our work has focused on reducing both static and dynamic power by minimizing the time the processor is active. As a case study we examine an ECG algorithm running on the proposed platform. From this example we have been able to make more general system level conclusions.
System level architecture
A generic sensor node consists of several subsystems, as depicted in Figure 1. This node consists of:
- A digital processing subsystem with level 1 local memory (L1)
- A level 2 (L2) memory subsystem (including RAM and non-volatile memories)
- An array of sensors and possibly actuators
- A radio system
- A power subsystem including a source and power manager, which is responsible for waking up various parts of the node when needed
This conceptual model can be applied to a multi-die implementation, leaving open several packaging technologies. If L2 memories are kept off-die, for example, then the size of L2 memory can be varied without creating a new sensor chip.

Figure 1. Wireless sensor node.
In current systems the power is supplied by a small battery or from energy scavengers. Battery powered nodes have the disadvantage of requiring maintenance. Different forms of energy scavenging are possible, but in this article we assume a power budget of around 100 μW [5]. This number includes power consumed by the radio and sensors—in other words, the power budget of the entire sensor node.
From a power point of view, the biggest consumers are the radio, the memory, and the digital subsystem. Commercially available radios consume 150 nJ/bit [7], and as a consequence the transmission of raw data can be expensive. An algorithm that reduces the amount of data via compression or feature extraction usually is a good compromise. In addition to the reducing the radio power consumption, most subsystems exploit duty cycling and sleep modes to reduce dissipation.
Duty cycling and sleeps modes are most effective when the DSP spends a minimal number of cycles implementing our algorithm, so this is an obvious area for optimization. We must also minimize the power consumed by the memory subsystem. Specifically, we needed a hierarchical memory subsystem which reduces the size of the lowest level memories, as fetching data from these memories consumes the most power.
|