Session: DISPS-L2
Time: 1:00 - 3:00, Wednesday, May 9, 2001
Location: Room 250 D
Title: Design and Implementation-Custom Processors
Chair: Magdy Bayoumi

1:00, DISPS-L2.1
A DIGITAL CHIP FOR ROBUST SPEECH RECOGNITION IN NOISY ENVIRONMENT
C. KIM, S. LEE
A digital chip has been developed for isolated word recognition in real-world noisy environments. By carefully comparing recognition performance and hardware implementability a modified-ZCPA model and RBF neural network model are selected for the feature extractor and classifier, respectively. The modified-ZCPA model is based on the feature extraction mechanism of human auditory system, and demonstrated superiority for noisy speeches. The RBF network has excellent OOV (out-of-vocabulary) rejection capability as well as good recognition performance. Both the feature extractor and classifier are implemented by repetition of simple operations, which result in reduction of memory operations for the full use of memory bandwidth. The chips is custom designed at the logic level without any DSP core, and implemented at an FPGA with 12 MHz clock speed.

1:20, DISPS-L2.2
FPGA IMPLEMENTATION OF A TUNABLE BANDPASS FILTER USING THE "BASIC HETERODYNE BLOCK"
D. SASIDARAN, A. AZAM, K. NELSON, M. SODERSTRAND
Any Band-Pass filter may be converted into a tunable filter with a single tuning parameter through the use of a new Tunable Heterodyne Band-Pass Filter concept in which the frequency of the heterodyne signal is adjusted thereby translating the entire filter transfer function in frequency. If the fixed filter is selected to be a narrow-band band-pass filter, the new Tunable Heterodyne Band-Pass Filter concept can be used very effectively in the elimination of narrow band interference in wide-band communications or control systems. A single-chip version of this tunable filter can be constructed using a Xilinx 4010 chip. The resulting filter is a flexible tunable band-pass filter that can be varied from DC to pi/2 or from pi/2 to pi depending on the parameters of the fixed output low-pass filter.

1:40, DISPS-L2.3
GENERIC SCHEDULING METHODS FOR A LINEAR QR ARRAY SOC PROCESSOR
J. MCCANNY, Z. LIU, G. LIGHTBODY, R. WALKE, Y. HU, L. ZHAOHUI
A scheduling method for implementing a generic linear QR array processor architecture is presented. This improves on previous work. It also considerably simplifies the derivation of schedules for a folded linear system, where detailed account has to be taken of processor cell latency. The architecture and scheduling derived provide the basis of a generator for the rapid design of System-on-a-Chip (SoC) cores for QR decomposition.

2:00, DISPS-L2.4
EFFICIENT IMPLEMENTATION OF A SET OF LIFTING BASED WAVELET FILTERS
K. ANDRA, C. CHAKRABARTI, T. ACHARYA
Lifting based wavelet transform implementation not only helps in reducing the number of computations but also achieves lossy to lossless performance with finite precision. In this paper we first do a precision analysis for the set of seven filters proposed by the JPEG2000 verification model. We determine the precision required to implement the filters using fixed point 2's complement arithmetic for lossless as well as lossy coding. Next we propose a unified architecture for implementing this set of filters for both the forward and the inverse transform.

2:20, DISPS-L2.5
AN FPGA IMPLEMENTATION OF WALSH-HADAMARD TRANSFORMS FOR SIGNAL PROCESSING
A. AMIRA, A. BOURIDANE, P. MILLIGAN, M. ROULA
The Walsh-Hadamard transforms are important in many signal processing applications including speech compression, filtering and coding. This paper presents novel architectures for the Fast Hadamard Transforms using both systolic architecture and distributed arithmetic techniques. The first approach uses the Baugh-Wooley multiplication algorithm for a systolic architecture implementation. The second approach is based on both distributed arithmetic ROM and accumulator structure, and a sparse matrix factorisation technique. The mathematical models for the two techniques, together with the implementation of the algorithms on a Xilinx FPGA board, are described. Distributed arithmetic approach exhibits better performances when compared with the systolic architecture approach.

2:40, DISPS-L2.6
ARCHITECTURE INDEPENDENT SHORT VECTOR FFTS
F. FRANCHETTI, H. KARNER, S. KRAL, C. UEBERHUBER
This paper introduces a SIMD vectorization for FFTW - the "fastest Fourier transform in the west" by Matteo Frigo and Steven Johnson. The new method leads to an architecture independent short vector SIMD FFT vectorization that utilizes the architecture adaptivity of FFTW. It is based on special FFT kernels (up to size 64 and more) that are utilized by FFTW to compute the whole transform. This vectorization supports all features of complex transforms in FFTW (arbitrary size, dimension and stride of the data vector; in-place and out-of-place transforms) and is fully transparent to the user. It is suitable for arbitrary vector sizes of the underlying hardware.