DISPS-1.1

High-level Modeling of Switching Activity With Application to Low-power DSP System Synthesis
Magnus Lundberg (Lulea University of Technology, Lulea, Sweden), Khurram Muhammad, Kaushik Roy (Purdue University, West Lafayette, IN 47907), Sarah Kate Wilson (Lulea University of Technology, Lulea, Sweden)

We address the issue of high-level synthesis of low-power {\em digital signal processing} (DSP) systems by proposing switching activity models. In particular, we present a technology independent hierarchical scheme to compare relative power performance of two competing DSP systems. The basic building blocks considered for such system are a full-adder and a one-bit delay. Estimates of switching activity at the output of these building blocks is used to model the activity in different architectural primitives used for building DSP systems. This method is very fast and simple and simulations show accuracy within 4\% of extensive bit-level simulations. Therefore, it can easily be integrated into current communications/DSP CAD tools for low-power applications. The models show that the choice of multiplier/multiplicand is important when using array multipliers in a data-path. If the input signal with smaller variance is chosen as the as the multiplicand, up to 20\% savings in switching activity can be achieved. This observation is verified by analog simulation.

DISPS-1.2

Activity Models for use in Low Power, High-Level Synthesis
Russell E. Henning, Chaitali Chakrabarti (Dept of Electrical Engineering, Arizona State University, Tempe, AZ 85287)

Characteristics of the data being processed can be used to reduce the power consumption in the data path of a VLSI circuit by exploiting their relationship with transition activity during high-level synthesis. Important relationships between fixed-point, two's complement data characteristics and 0->1 transition activity in static CMOS circuits are presented in this paper. Models for computing transition activity in terms of a new set of transition parameters are developed. Propagation of data characteristics through multiplication and addition functional units is discussed. The use of the relationships and models to analyze and significantly reduce 0->1 transition activity with little computational effort is illustrated with examples.

DISPS-1.3

Multirate as a Hardware Paradigm
Bruce W Suter (Air Force Research Laboratory), Kenneth S Stevens (Intel), Scott R Velazquez (V Company), Truong Nguyen (Boston University)

Architecture and circuit design are the two most effective means of reducing power in CMOS VLSI. Mathematical manipulations, based on applying the ideas from multirate signal processing have been applied to create high performance, low power architectures. To illustrate this approach, two case studies are presented - one concerns the design of a fast Fourier transform(FFT) device, while the other one is concerned with the design of analog-to-digital converters.

DISPS-1.4

Unfolding Probabilistic Data-flow Graphs Under Different Timing Models
Sissades Tongsima, Timothy W O'Neil, Edwin Sha (Dept. of Computer Science and Engr., University of Notre Dame)

It is known that in many applications, because of selection statements, e.g., if-statement, the computation time of a node can be represented by a random variable. This paper focuses on any iterative application (containing loops) reflecting those uncertainties. Such an application can then be transformed to a probabilistic data-flow graph. A challenging problem is to derive graph transformation techniques which can produce a good schedule. This paper introduces two timing models, the time-invariant and time-variant models, to characterize the nature of these applications. Furthermore, for the time-invariant model, we propose a means of selecting a minimum rate-optimal unfolding factor which guarantees the best schedule length. We also propose a good estimation for choosing an unfolding factor for a graph under the time-variant model.

DISPS-1.5

Low-power Channel Coding via Dynamic Reconfiguration
Manish Goel, Naresh R Shanbhag (University of Illinois at Urbana-Champaign)

Presented in this paper are energy-optimum reconfiguration strategies for channel codecs. These strategies are derived by solving an optimization problem, which has energy consumption as the objective function and a constraint on the bit error-rate (BER). The energy consumption models for the reconfigurable Reed-Solomon (RS) codec are derived via gate-level simulation of the finite field arithmetic modules. These energy models along with the BER expressions are then employed to derive the energy-optimum reconfiguration strategies. The energy savings are computed by comparing the energy consumption of the reconfigurable codec with that of the static codec. The energy savings range from 0%-83% for channel signal-to-noise ratio (SNR) variations from 7dB-10dB. On an average 55% energy savings are achieved.

DISPS-1.6

Closed-Form and Real-Time Wordlength Adaptation
Paul D Fiore (Sanders, A Lockheed Martin Co.), Li Lee (Massachusetts Institute of Technology)

FPGA and configurable computing-based DSP algorithms have demonstrated significant performance improvements over software implementations. This has caused recent renewed interest in developing or mapping DSP algorithms to custom hardware. An algorithm will be successfully mapped if the intermediate wordlengths can be reduced to maintain reasonable hardware size. In this paper, we consider linear hardware cost functions, for which we can derive closed-form expressions for the reduced wordlengths. We then apply these results to an adaptive LMS filter, where we adapt not only the tap weights, but also the wordlengths as a function of the data in real-time.

DISPS-1.7

Synthesis of DSP Soft Real-Time Multiprocessor Systems-on-Silicon
Darko Kirovski, Miodrag Potkonjak (COMPUTER SCIENCE DEPARTMENT, UNIVERSITY OF CALIFORNIA, LOS ANGELES)

The recent convergence of applications (Internet and embedded applications) and technology (reuse and very high integration level) trends resulted in a strong need for design of soft real-time DSP systems on silicon. We developed a new hierarchical modular approach for synthesis of area efficient soft real-time DSP systems on silicon. This synthesis strategy employs a number of optimization intensive scheduling, performance monitoring, and allocation steps. The backbone of the optimization approach is a novel on-line scheduling algorithm which uses meta-algorithmic techniques for on-the-fly heuristic selection and parameter tuning. Resource allocation refers to a predetermined lower-bound system performance, to perform a branch-and-bound resource allocation search for an area-efficient multiprocessor configuration where each processor has local instruction and data cache. In order to bridge the gap between the profiling, modeling, and synthesis tools of the two traditionally independent synthesis domains (architecture and CAD), we develop a new synthesis and evaluation platform which integrates the existing modeling, profiling, and simulation tools with the new developed system-level synthesis tools. The effectiveness of the approach is demonstrated on the industrial strength MediaBench benchmark suite.

DISPS-1.8

A Generic Methodology for the Software Managing of Caches in Multi-Processors DSP Architectures
Frantz LOHIER, Lionel Lacassagne (EIA/LIS), Patrick Garda (LIS)

This article introduces a novel software engineering methodology designed for the real-time execution of low-level image operators running on multi-processors DSP architectures. We detail the results we gained while implementing our approach on the TMS320C80, a shared memory multi-processors architecture [1]. Our contribution compares to other existing C80's image processing libraries [2][3] in terms of genericity, flexibility, and performance improvement. More specifically, generic mechanisms allow to address various operator's requirements as well as expanding them using a standard framework. Our approach is flexible enough to allow for the dynamic composing of concurrent and reconfigurable processing chains thanks to a modular library implementing basic operators. Processing chains work on various image sizes and with any number of processors. Above all, our methodology permits performance improvement by enhancing data locality.

DISPS-2 >

Last Update: February 4, 1999 Ingo Höntsch