Session: DISPS-P1
Time: 1:00 - 3:00, Thursday, May 10, 2001
Location: Exhibit Hall Area 6
Title: Design Methodologies/Low Power DSP
Chair: Chaitali Chakrabarti

1:00, DISPS-P1.1
DISCRETE POLYNOMIAL TRANSFORM REPRESENTATION USING BINARY MATRICES AND FLOW DIAGRAMS
M. ABURDENE, R. KOZICK, R. MAGARGLE, J. MALONEY-HAHN, C. COVIELLO
This paper presents a new method for computing discrete polynomial transforms. The method is shown for the Hermite, binomial, and Laguerre transforms. The new method factors Pascal's matrix into binary matrices. Constructing the flow diagrams for the transform matrices requires only additions and N-2 multipliers for N-point Hermite and binomial transforms, and 2N multipliers for an N-point Laguerre transform. The method involves a three-stage process where stages 1 and 3 are identical for all three transforms.

1:00, DISPS-P1.2
DATA AND INSTRUCTION MEMORY EXPLORATION OF EMBEDDED SYSTEMS FOR MULTIMEDIA APPLICATIONS
M. DASIGENIS, N. KROUPIS, A. ARGYRIOU, N. ZERVAS, K. TATAS, D. SOUDRIS
A methodology for power optimization of the data memory hierarchy and instruction memory, is introduced. The effect of the methodology on a set of widely used multimedia application kernels, namely Full Search (FS), Hierarchical Search (HS), and Parellel Hierarchical One Dimension Search (PHODS), is demonstrated . Three different target architecture models are used. The issues of the data memory power reduction and instruction memory are tackled separately. We find the power optimal data memory hierarchy applying the appropriate data-use transformation, while the instruction power optimization is done using suitable cache memory. Using data-reuse transformations, performance optimizations techniques, and instruction-level transformations, we perform exhaustive exploration of all the possible alternatives to reach power efficient solutions. The experimental results prove the efficiency of the methodology in terms of power for all the multimedia kernels.

1:00, DISPS-P1.3
AN EFFICIENT TIMING MODEL FOR HARDWARE IMPLEMENTATION OF MULTIRATE DATAFLOW GRAPHS
N. CHANDRACHOODAN, S. BHATTACHARYYA, K. LIU
We consider the problem of representing timing information associated with functions in a dataflow graph used to represent a signal processing system in the context of high-level hardware (architectural) synthesis. This information is used for synthesis of appropriate architectures for implementing the graph. Conventional models for timing suffer from shortcomings that make it difficult to represent timing information in a hierarchical manner, especially for multirate signal processing systems. We identify some of these shortcomings, and provide an alternate model that does not have these problems. We show that with some reasonable assumptions on the way hardware implementations of multirate systems operate, we can derive general hierarchical descriptions of multirate systems similarly to single rate systems. Several analytical results such as the computation of the iteration period bound, that previously applied only to single rate systems can also easily be extended to multirate systems under the new assumptions. We have applied our model to several multirate signal processing applications, and obtained favorable results. We present results of the timing information computed for several multirate DSP applications that show how the new treatment can streamline the problem of performance analysis and synthesis of such systems.

1:00, DISPS-P1.4
IMPLEMENTING PARALLELISM AND SCHEDULING DATA FLOW GRAPHS ON JAVA VIRTUAL MACHINE
J. XU, E. SHA
In this paper, we present a scheme which explores the parallelism on JVM. An algorithm, calle dynamic-duplication scheduling is developed for solving the static scheduling and code generation for data flow graphs (DFG) on the parallel JVM. Experimental results show that the schedule produced by the algorithm on parallel JVM is significantly improved compared with the traditonal JVM.

1:00, DISPS-P1.5
DSP DATAPATH SYNTHESIS FOR LOW-POWER APPLICATIONS
L. CHIOU, K. MUHAMMAD, K. ROY
In this paper, we present a high-level synthesis technique targeting low power consumption for data-dominated applications. We have used a statistical estimation technique to obtain switching activity of modules when sharing of computing resources are required in a design. The technique enables us to understand switching behavior under resource sharing. Using the relationship between switching power and resource sharing thus obtained, we developed scheduling and allocation algorithm to reduce data path switching power. Experiments performed on various examples show up to 49% improvement in power reduction under resource constraints.

1:00, DISPS-P1.6
POWER EFFICIENT SEMI-AUTOMATIC INSTRUCTION ENCODING FOR APPLICATION SPECIFIC INSTRUCTION SET PROCESSORS
T. GLOEKLER, S. BITTERLICH
A novel design methodology for the implementation of control units for application specific instruction set processors (ASIPS) is described. This methodology uses automatic instruction encoding and semi-automatic generation of the hardware instruction decoder to speed up the ASIP design. Significant power savings due to optimized instruction encoding are achieved. Results for ICORE (ISS-Core), which is an ASIP for digital video broadcasting algorithms of Infineon Technologies, demonstrate the efficiency and applicability of this approach.

1:00, DISPS-P1.7
DIGIT-SERIAL MODULAR MULTIPLICATION USING SKEW-TOLERANT DOMINO CMOS
S. KIM, G. SOBELMAN
A novel connection between digit-serial computing and skew-tolerant domino circuit design is developed and applied to the design of a 512-bit modular multiplier. In our design, a digit size of four bits is efficiently mapped onto a four-phase overlapping clocking scheme, so that four bits are processed during each full clock cycle. Our architecture is based on a modified interleaved multiplication algorithm and uses precomputed complements of the modulus and a carry save adder scheme. We also present a technique for modeling time borrowing behavior in skew-tolerant domino using a VHDL behavioral description. This allows very large skew-tolerant domino circuits to be simulated efficiently in such a way that the essential time borrowing behavior is correctly represented. This simulation methodology is used to verify the correctness of our design and to determine its throughput.

1:00, DISPS-P1.8
A SURVEY ON MODELLING ISSUES USING THE MACHINE DESCRIPTION LANGUAGE LISA
A. HOFFMANN, A. NOHL, G. BRAUN, H. MEYR
This paper presents survey on modeling issues of programmable architectures using the machine description language LISA. Various architectures presenting diverse architectural charasteristics will be shown and the feasibility of automatically generating simulator, assembler, linker and graphical debugger frontend will be discussed. The presented approach is not limited to a fixed abstraction level - case studies of the Texas Instruments C62x and C54x, the Analog Devices ADSP2101 as well as the ARM7 will show the applicability of the methodology from cycle/phase to instruction accurate models. Please refer to http://www.iss.rwth-aachen.de/lisa for more information

1:00, DISPS-P1.9
CODEF: A SYSTEM LEVEL DESIGN SPACE EXPLORATION TOOL
M. AUGUIN, L. CAPELLA, F. CUESTA, E. GRESSET
The increasing complexity of embedded applications combined with the advances in chip integration make the design process a very challenging task. Due to this rising complexity, the design under performance, area and consumption constraints of a system-on-a-chip (SOC) composed of mixed software-hardware units, becomes increasingly intricate. This paper presents a method and an associated tool (CODEF) which allow the designer to do an automatic and/or interactive system design space exploration in order to construct cost effective embedded real-time architectures dedicated to complex signal processing applications. The method is based on a recursive partitioning algorithm followed by a communication synthesis procedure.

1:00, DISPS-P1.10
ANALYTICAL EXPLORATION OF POWER EFFICIENT DATA-REUSE TRANSFORMATIONS ON MULTIMEDIA APPLICATIONS
S. KOUGIA, A. CHATZIGEORGIOU, N. ZERVAS, S. NIKOLAIDIS
Power savings that can be achieved by data-reuse decisions targeting at a custom memory hierarchy for multimedia applications executing on embedded cores are examined in this paper. Exploiting the temporal locality of memory accesses in data-intensive applications a set of data-reuse transformations on a typical motion estimation algorithm is determined. The aim is to reduce data related power consumption by moving background memory accesses to smaller foreground memories, which are less power costly. The impact of these transformations on power, performance and area is evaluated both for application specific circuits and general purpose processors. The number of data and instruction memory accesses is analytically calculated, enabling a fast exploration of the design space by varying algorithmic parameters.