1:00, DISPS-P1.1
DISCRETE POLYNOMIAL TRANSFORM REPRESENTATION USING BINARY MATRICES AND FLOW DIAGRAMS
M. ABURDENE, R. KOZICK, R. MAGARGLE, J. MALONEY-HAHN, C. COVIELLO
This paper presents a new method for computing discrete polynomial transforms. The method is shown for the Hermite, binomial, and Laguerre transforms. The new method factors Pascal's matrix into binary matrices. Constructing the flow diagrams for the transform matrices requires only additions and N-2 multipliers for N-point Hermite and binomial transforms, and 2N multipliers for an N-point Laguerre transform. The method involves a three-stage process where stages 1 and 3 are identical for all three transforms.
1:00, DISPS-P1.2
DATA AND INSTRUCTION MEMORY EXPLORATION OF EMBEDDED SYSTEMS FOR MULTIMEDIA APPLICATIONS
M. DASIGENIS, N. KROUPIS, A. ARGYRIOU, N. ZERVAS, K. TATAS, D. SOUDRIS
A methodology for power optimization of the data memory hierarchy and instruction memory, is introduced. The effect of the methodology on a set of widely used multimedia application kernels, namely Full Search (FS), Hierarchical Search (HS), and Parellel Hierarchical One Dimension Search (PHODS), is demonstrated . Three different target
architecture models are used. The issues of the data memory power reduction and instruction memory are tackled separately. We find the power optimal data memory hierarchy applying the appropriate data-use transformation, while the instruction power optimization is done using suitable cache memory. Using data-reuse transformations, performance optimizations techniques, and instruction-level transformations, we perform exhaustive exploration of all the possible alternatives to reach power efficient solutions. The experimental results prove the efficiency of the methodology in terms of
power for all the multimedia kernels.
1:00, DISPS-P1.3
AN EFFICIENT TIMING MODEL FOR HARDWARE IMPLEMENTATION OF MULTIRATE DATAFLOW GRAPHS
N. CHANDRACHOODAN, S. BHATTACHARYYA, K. LIU
We consider the problem of representing timing information associated
with functions in a dataflow graph used to represent a signal
processing system in the context of high-level hardware
(architectural) synthesis. This information is used for synthesis of
appropriate architectures for implementing the graph. Conventional
models for timing suffer from shortcomings that make it difficult to
represent timing information in a hierarchical manner, especially for
multirate signal processing systems.
We identify some of these shortcomings, and provide an alternate model
that does not have these problems. We show that with some reasonable
assumptions on the way hardware implementations of multirate systems
operate, we can derive general hierarchical descriptions of multirate
systems similarly to single rate systems. Several analytical results
such as the computation of the iteration period bound, that previously
applied only to single rate systems can also easily be extended to
multirate systems under the new assumptions.
We have applied our model to several multirate signal processing
applications, and obtained favorable results. We present
results of the timing information computed for several multirate DSP
applications that show how the new treatment can streamline the
problem of performance analysis and synthesis of such systems.
1:00, DISPS-P1.4
IMPLEMENTING PARALLELISM AND SCHEDULING DATA FLOW GRAPHS ON JAVA VIRTUAL MACHINE
J. XU, E. SHA
In this paper, we present a scheme which explores the parallelism on JVM. An algorithm, calle dynamic-duplication scheduling is developed for solving the static scheduling and code generation for data flow graphs (DFG) on the parallel JVM. Experimental results show that the schedule produced by the algorithm on parallel JVM is significantly improved compared with the
traditonal JVM.
1:00, DISPS-P1.5
DSP DATAPATH SYNTHESIS FOR LOW-POWER APPLICATIONS
L. CHIOU, K. MUHAMMAD, K. ROY
In this paper, we present a high-level synthesis technique targeting low power consumption for data-dominated applications. We have used a statistical estimation technique to obtain switching activity of modules when sharing of computing resources are required in a design. The technique enables us to understand switching behavior under resource sharing. Using the relationship between switching power and resource sharing thus obtained, we developed scheduling and allocation algorithm to reduce data path switching power. Experiments performed on various examples show up to 49% improvement in power reduction under resource constraints.
1:00, DISPS-P1.6
POWER EFFICIENT SEMI-AUTOMATIC INSTRUCTION ENCODING FOR APPLICATION SPECIFIC INSTRUCTION SET PROCESSORS
T. GLOEKLER, S. BITTERLICH
A novel design methodology for the implementation of control units
for application specific instruction set processors (ASIPS) is
described. This methodology uses automatic instruction encoding and
semi-automatic generation of the hardware instruction decoder to speed
up the ASIP design. Significant power savings due to optimized
instruction encoding are achieved. Results for ICORE (ISS-Core),
which is an ASIP for digital video broadcasting algorithms of
Infineon Technologies, demonstrate the efficiency and applicability
of this approach.
1:00, DISPS-P1.7
DIGIT-SERIAL MODULAR MULTIPLICATION USING SKEW-TOLERANT DOMINO CMOS
S. KIM, G. SOBELMAN
A novel connection between digit-serial computing and skew-tolerant
domino circuit design is developed and applied to the design of
a 512-bit modular multiplier. In our design,
a digit size of four bits is
efficiently mapped onto a four-phase overlapping clocking scheme,
so that four bits are processed during each full clock cycle.
Our architecture is based on a modified interleaved multiplication algorithm
and uses precomputed complements of the modulus and a
carry save adder scheme.
We also present a technique for modeling time borrowing behavior
in skew-tolerant domino using a VHDL behavioral description. This
allows very large skew-tolerant domino circuits
to be simulated efficiently in such a way that the
essential time borrowing behavior is correctly represented.
This simulation methodology is used to verify the correctness of
our design and to determine its throughput.
1:00, DISPS-P1.8
A SURVEY ON MODELLING ISSUES USING THE MACHINE DESCRIPTION LANGUAGE LISA
A. HOFFMANN, A. NOHL, G. BRAUN, H. MEYR
This paper presents survey on modeling issues of programmable
architectures using the machine description language LISA. Various
architectures presenting diverse architectural charasteristics will be
shown and the feasibility of automatically generating simulator,
assembler, linker and graphical debugger frontend will be discussed.
The presented approach is not limited to a fixed abstraction level
- case studies of the Texas Instruments C62x and C54x, the Analog
Devices ADSP2101 as well as the ARM7 will show the applicability of
the methodology from cycle/phase to instruction accurate models.
Please refer to http://www.iss.rwth-aachen.de/lisa for more information
1:00, DISPS-P1.9
CODEF: A SYSTEM LEVEL DESIGN SPACE EXPLORATION TOOL
M. AUGUIN, L. CAPELLA, F. CUESTA, E. GRESSET
The increasing complexity of embedded applications combined with the advances in chip integration make the design process a very challenging task. Due to this rising complexity, the design under performance, area and consumption constraints of a system-on-a-chip (SOC) composed of mixed software-hardware units, becomes increasingly intricate. This paper presents a method and an associated tool (CODEF) which allow the designer to do an automatic and/or interactive system design space exploration in order to construct cost effective embedded real-time architectures dedicated to complex signal processing applications. The method is based on a recursive partitioning algorithm followed by a communication synthesis procedure.
1:00, DISPS-P1.10
ANALYTICAL EXPLORATION OF POWER EFFICIENT DATA-REUSE TRANSFORMATIONS ON MULTIMEDIA APPLICATIONS
S. KOUGIA, A. CHATZIGEORGIOU, N. ZERVAS, S. NIKOLAIDIS
Power savings that can be achieved by data-reuse decisions targeting at a custom memory hierarchy for multimedia applications executing on embedded cores are examined in this paper. Exploiting the temporal locality of memory accesses in data-intensive applications a set of data-reuse transformations on a typical motion estimation algorithm is determined. The aim is to reduce data related power consumption by moving background memory accesses to smaller foreground memories, which are less power costly. The impact of these transformations on power, performance and area is evaluated both for application specific circuits and general purpose processors. The number of data and instruction memory accesses is analytically calculated, enabling a fast exploration of the design space by varying algorithmic parameters.