Authors:
Magnus Lundberg, Lulea University of Technology, Lulea, Sweden (Sweden)
Khurram Muhammad,
Kaushik Roy,
Sarah Kate Wilson, Lulea University of Technology, Lulea, Sweden (Sweden)
Page (NA) Paper number 1763
Abstract:
We address the issue of high-level synthesis of low-power digital
signal processing (DSP) systems by proposing switching activity models.
In particular, we present a technology independent hierarchical scheme
to compare relative power performance of two competing DSP systems.
The basic building blocks considered for such system are a full-adder
and a one-bit delay. Estimates of switching activity at the output
of these building blocks is used to model the activity in different
architectural primitives used for building DSP systems. This method
is very fast and simple and simulations show accuracy within 4% of
extensive bit-level simulations. Therefore, it can easily be integrated
into current communications/DSP CAD tools for low-power applications.
The models show that the choice of multiplier/multiplicand is important
when using array multipliers in a data-path. If the input signal with
smaller variance is chosen as the as the multiplicand, up to 20% savings
in switching activity can be achieved. This observation is verified
by analog simulation.
Authors:
Russell E. Henning,
Chaitali Chakrabarti,
Page (NA) Paper number 2218
Abstract:
Characteristics of the data being processed can be used to reduce the
power consumption in the data path of a VLSI circuit by exploiting
their relationship with transition activity during high-level synthesis.
Important relationships between fixed-point, two's complement data
characteristics and 0->1 transition activity in static CMOS circuits
are presented in this paper. Models for computing transition activity
in terms of a new set of transition parameters are developed. Propagation
of data characteristics through multiplication and addition functional
units is discussed. The use of the relationships and models to analyze
and significantly reduce 0->1 transition activity with little computational
effort is illustrated with examples.
Authors:
Bruce W Suter,
Kenneth S Stevens,
Scott R Velazquez,
Truong Nguyen,
Page (NA) Paper number 1996
Abstract:
Architecture and circuit design are the two most effective means of
reducing power in CMOS VLSI. Mathematical manipulations, based on applying
the ideas from multirate signal processing have been applied to create
high performance, low power architectures. To illustrate this approach,
two case studies are presented - one concerns the design of a fast
Fourier transform(FFT) device, while the other one is concerned with
the design of analog-to-digital converters.
Authors:
Sissades Tongsima,
Timothy W O'Neil,
Edwin H.M. Sha,
Page (NA) Paper number 2061
Abstract:
It is known that in many applications, because of selection statements,
e.g., if-statement, the computation time of a node can be represented
by a random variable. This paper focuses on any iterative application
(containing loops) reflecting those uncertainties. Such an application
can then be transformed to a probabilistic data-flow graph. A challenging
problem is to derive graph transformation techniques which can produce
a good schedule. This paper introduces two timing models, the time-invariant
and time-variant models, to characterize the nature of these applications.
Furthermore, for the time-invariant model, we propose a means of selecting
a minimum rate-optimal unfolding factor which guarantees the best schedule
length. We also propose a good estimation for choosing an unfolding
factor for a graph under the time-variant model.
Authors:
Manish Goel,
Naresh R Shanbhag,
Page (NA) Paper number 2194
Abstract:
Presented in this paper are energy-optimum reconfiguration strategies
for channel codecs. These strategies are derived by solving an optimization
problem, which has energy consumption as the objective function and
a constraint on the bit error-rate (BER). The energy consumption models
for the reconfigurable Reed-Solomon (RS) codec are derived via gate-level
simulation of the finite field arithmetic modules. These energy models
along with the BER expressions are then employed to derive the energy-optimum
reconfiguration strategies. The energy savings are computed by comparing
the energy consumption of the reconfigurable codec with that of the
static codec. The energy savings range from 0%-83% for channel signal-to-noise
ratio (SNR) variations from 7dB-10dB. On an average 55% energy savings
are achieved.
Authors:
Paul D Fiore,
Li Lee,
Page (NA) Paper number 1206
Abstract:
FPGA and configurable computing-based DSP algorithms have demonstrated
significant performance improvements over software implementations.
This has caused recent renewed interest in developing or mapping DSP
algorithms to custom hardware. An algorithm will be successfully mapped
if the intermediate wordlengths can be reduced to maintain reasonable
hardware size. In this paper, we consider linear hardware cost functions,
for which we can derive closed-form expressions for the reduced wordlengths.
We then apply these results to an adaptive LMS filter, where we adapt
not only the tap weights, but also the wordlengths as a function of
the data in real-time.
Authors:
Darko Kirovski,
Miodrag Potkonjak,
Page (NA) Paper number 1832
Abstract:
The recent convergence of applications (Internet and embedded applications)
and technology (reuse and very high integration level) trends resulted
in a strong need for design of soft real-time DSP systems on silicon.
We developed a new hierarchical modular approach for synthesis of area
efficient soft real-time DSP systems on silicon. This synthesis strategy
employs a number of optimization intensive scheduling, performance
monitoring, and allocation steps. The backbone of the optimization
approach is a novel on-line scheduling algorithm which uses meta-algorithmic
techniques for on-the-fly heuristic selection and parameter tuning.
Resource allocation refers to a predetermined lower-bound system performance,
to perform a branch-and-bound resource allocation search for an area-efficient
multiprocessor configuration where each processor has local instruction
and data cache. In order to bridge the gap between the profiling, modeling,
and synthesis tools of the two traditionally independent synthesis
domains (architecture and CAD), we develop a new synthesis and evaluation
platform which integrates the existing modeling, profiling, and simulation
tools with the new developed system-level synthesis tools. The effectiveness
of the approach is demonstrated on the industrial strength MediaBench
benchmark suite.
Authors:
Frantz Lohier,
Lionel Lacassagne,
Patrick Garda,
Page (NA) Paper number 1668
Abstract:
This article introduces a novel software engineering methodology designed
for the real-time execution of low-level image operators running on
multi-processors DSP architectures. We detail the results we gained
while implementing our approach on the TMS320C80, a shared memory multi-processors
architecture [1]. Our contribution compares to other existing C80's
image processing libraries [2][3] in terms of genericity, flexibility,
and performance improvement. More specifically, generic mechanisms
allow to address various operator's requirements as well as expanding
them using a standard framework. Our approach is flexible enough to
allow for the dynamic composing of concurrent and reconfigurable processing
chains thanks to a modular library implementing basic operators. Processing
chains work on various image sizes and with any number of processors.
Above all, our methodology permits performance improvement by enhancing
data locality.
|