APPLICATION IMPLEMENTATIONS AND MAPPINGS

Chair: Teresa Meng, Stanford University (USA)

Home

Jacobi SVD Algorithms for Tracking of Nonstationary Signals

Authors:

F. Lorenzelli, University of California at Los Angeles (USA)
K. Yao, University of California at Los Angeles (USA)

Volume 5, Page 3183

Abstract:

In this paper we consider the algorithm for SVD updating based on Jacobi rotations. In order to overcome the trade-off between accuracy and updating rate intrinsic in the original algorithm, we propose two schemes which improve the overall performance when the rate of change of the data is high. In the "variable forgetting factor" approach, the effective width of the observation adjusts to the data nonstationarity. The former scheme ensures closeness to convergence at all times, while the latter adapts the response to data variation. We consider applications of the SVD updating algorithm to speech processing of segmentation, adaptive parameter estimation, and glottal closure detection.

300dpi TIFF Images of pages:

3183 3184 3185 3186

Acrobat PDF file of whole paper:

ic953183.pdf

TOP

A 100 MHz Pipelined RLS Adaptive Filter

Authors:

Kalavai J. Raghunath, University of Minnesota (USA)
Keshab K. Parhi, University of Minnesota (USA)

Volume 5, Page 3187

Abstract:

Recently, a new pipelinable PSTAR-RLS algorithm was developed. It was shown to be an effective alternative to the QRD-RLS algorithm when high-speeds are required. Using folding technique, a 4-tap PSTAR-RLS algorithm was implemented on a single VLSI chip. All the operations in the chip are bit-level pipelined. With a 1.2(mu) CMOS technology this chip is expected to run at 100 MHz. Redundant number system based arithmetic operators were used for performance advantage. Apart from a wafer scale implementation, this is the first ever single chip ASIC implementation of a RLS adaptive filter.

300dpi TIFF Images of pages:

3187 3188 3189 3190

Acrobat PDF file of whole paper:

ic953187.pdf

TOP

A Complex Arithmetic Digital Signal Processor Using CORDIC Rotators

Authors:

S. Freeman, University of Michigan (USA)
M. O'Donnell, University of Michigan (USA)

Volume 5, Page 3191

Abstract:

A versatile signal processor has been designed that can perform multiple rotations, multiplications and additions within one clock cycle. The computational elements of this processor include four pipelined CORDIC rotators, two pipelined fast multipliers and two adders. A combination of register files, SRAM,and ROM provides on chip storage for coefficients, running sums and programs. The chip architecture and its applicability to complex valued signal processing tasks are discussed.

300dpi TIFF Images of pages:

3191 3192 3193 3194

Acrobat PDF file of whole paper:

ic953191.pdf

TOP

VLSI Architecture of the Generalized Multi Delay Frequency-Domain Algorithm for Acoustic Echo Cancellation

Authors:

Amer El Helwani, France Telecom- CNET (FRANCE)
Patrice Le Scan, France Telecom- CNET (FRANCE)

Volume 5, Page 3195

Abstract:

Within the context of acoustic echo cancellation, the convergence rate of the NLMS adaptive filtering algorithm is not sufficient when the input signal is strongly correlated (speech signal). The Generalized Multi Delay Frequency-domain (GMDF) algorithm allows a faster convergence rate and faster tracking of the variation in the echo path to be identified. In these applications the filter may have several thousand tap length at a sampling rate of 16 KHz. This fact leads to difficulties in real time implementation of the algorithm. This paper describes an optimized VLSI architecture of a specific circuit which is designed to perform the computational intensive task of the GMDF algorithm in real time. It can be carried out for long impulse response filters.

300dpi TIFF Images of pages:

3195 3196 3197 3198

Acrobat PDF file of whole paper:

ic953195.pdf

TOP

A Highly-Parallel DSP Architecture for Image Recognition

Authors:

Hiroyuki Kawai, Mitsubishi Electric Corporation (JAPAN)
Yoshitugu Inoue, Mitsubishi Electric Corporation (JAPAN)
Robert Streitenberger, Mitsubishi Electric Corporation (JAPAN)
Masahiko Yoshimoto, Mitsubishi Electric Corporation (JAPAN)

Volume 5, Page 3199

Abstract:

This paper presents the architecture of a newly developed highly parallel DSP suited for realtime image recognition. The programmable DSP was designed for a variety of image recognition systems, such as computer vision systems, character recognition systems and others. The DSP consists of optimized functional units for image recognition: SIMD processing core, a hierarchical bus, Address Generation Unit, Data Memories, DMAC, Link Unit, and Control Unit. The DSP can process a 5x5 spatial filtering for 512x512 images within 13.1msec. Adopting the DSP to a Japanease character recognition system, the speed of 924 characters/sec can be achieved for feature extractions and feature vectors matchings. The DSP can be integrated in a 14.5x14.5mm2 single-chip, using 0.5um CMOS technology. In this paper, the key features of the architecture and the new techniques enabling efficient operation of the eight parallel processing units are described. Estimation of the performance of the DSP is also presented.

300dpi TIFF Images of pages:

3199 3200 3201 3202

Acrobat PDF file of whole paper:

ic953199.pdf

TOP

Adaptive Jacobi Method for Parallel Singular Value Decompositions

Authors:

Shen-Fu Hsiao, National Sun Yat-Sen University (TAIWAN)

Volume 5, Page 3203

Abstract:

Jacobi method has been used on special- purpose multi-processor VLSI systems for parallel singular value decomposition (SVD) of dense matrices, and CORDIC processors are often used as the basic processing elements to implement the two-sided rotations, the fundamental operations in the Jacobi method. Recently, generalizations of the original CORDIC algorithm to multi-dimensional spaces have been used in the SVD of complex matrices to achieve faster computation speed. A further speed-up of more than 2 can be gained by gradually refining the resolution of the CORDIC algorithms used in the Jacobi method.

300dpi TIFF Images of pages:

3203 3204 3205 3206

Acrobat PDF file of whole paper:

ic953203.pdf

TOP

PSEUDEC: Implementation of the Computation -Intensive PARTRAN Functionality Using a Dedicated On-Line CORDIC Co-processor

Authors:

Finn T. Moller, Aalborg University
Jack B. Andersen, Grundfos Electronics
Hans R. Jensen, Aalborg University (DENMARK)
Ole Olsen, Aalborg University (DENMARK)
Flemming K. Fink, Aalborg University (DENMARK)

Volume 5, Page 3207

Abstract:

This paper describes PSEUDEC, a dedicated co-processor and the rationale behind its design. The final goal of our work is to present an advanced digital hearing aid based on parameterized transformation of speech (PARTRAN), as a single chip solution with low power consumption. The subset of PARTRAN implemented by PSEUDEC performs PSEUdo DEComposition of a 12th order LPC polynomial. An adapted algorithm displays improved dynamic range compared to a conventional solution suited for DSP's, calculating the amplitude spectrum rather than the power spectrum. Highly pipelined CORDIC- units optimized for the application replaces complex multiplication, trigonometric operations (for $e^{jw}$) and square root (for 2-norm of complex vector), exploiting the power of CORDIC operations in advanced DSP algorithms. PSEUDEC uses ON-LINE arithmetic for efficient implementation of operators and for efficient inter-operator communication. PSEUDEC has been implemented using ordinary standard cells.

300dpi TIFF Images of pages:

3207 3208 3209 3210

Acrobat PDF file of whole paper:

ic953207.pdf

TOP

Fast Subspace Tracking Using Coarse Grain and Fine Grain Parallelism

Authors:

Daniel Rabideau, United States Air Force
Allan Steinhardt, MIT Lincoln Laboratory (USA)

Volume 5, Page 3211

Abstract:

Subspace tracking is an integral part of many high resolution adaptive array methods. Unfortunately, the high computational complexity and non-parallel nature of traditional subspace tracking algorithms have deterred their use in real-time systems. In this paper we discuss parallel mappings of the Fast Subspace Tracking Algorithm. The serial complexity of this algorithm is already among the lowest {O(Nr) for N channels and an r dimensional subspace}. In this paper, we show that even greater reductions in effective complexity can be achieved by mapping our algorithm onto multiple processors. Near linear speedup is obtained on machines spanning the range from fine grain systolic arrays to coarse grain commercially available MPPS.

300dpi TIFF Images of pages:

3211 3212 3213 3214

Acrobat PDF file of whole paper:

ic953211.pdf

TOP

An Equalizing and Channel Coding Processor for GSM Terminals

Authors:

Minoru Okamoto, Matsushita Electric Ind. Co. Ltd.
Toshihiro Ishikawa, Matsushita Communication Ind. Co. Ltd.
Shinichi Mauri, Matsushita Electric Ind. Co. Ltd.
Masayuki Yamaski, Matsushita Electric Ind. Co. Ltd.
Katsuhiko Ueda, Matsushita Electric Ind. Co. Ltd.
Nobuo Asano, Matsushita Communication Ind. Co. Ltd. (JAPAN)
Mitsuru Uesugi, Matsushita Communication Ind. Co. Ltd. (JAPAN)
Yoshiko Saitoh, Matsushita Communication Ind. Co. Ltd. (JAPAN)
Yukihiro Fujimoto, Matsushita Communication Ind. Co. Ltd. (JAPAN)
Susumu Furushima, Matsushita Communication Ind. Co. Ltd. (JAPAN)

Volume 5, Page 3215

Abstract:

A new DSP architecture for equalizing, channel coding/decoding and encryption/decryption required by GSM hand portable terminals is presented. In the DSP, which is called EQCHAN (Equalizer and Channel coding/decoding processor), these tasks are managed in common units, that is, the data processing unit (DPU) and the bit manipulation unit (BMU). The LSI that contains EQCHAN was designed using 0.8 um CMOS technology and its die size is 123$mm^2$. The power consumed in the LSI is 60mW at 3.6V under continuous communication mode and this value is sufficient for a portable terminal. In this paper, we describe the detail architecture of EQCHAN.

300dpi TIFF Images of pages:

3215 3216 3217 3218

Acrobat PDF file of whole paper:

ic953215.pdf

TOP

The MGAP-2: An Advanced, Massively Parallel VLSI Signal Processor

Authors:

Thomas P. Kelliher, Westminster College
Eric S. Gayles, Pennsylvania State University(USA)
Robert M. Owens, Pennsylvania State University(USA)
Mary Jane Irwin, Pennsylvania State University(USA)

Volume 5, Page 3219

Abstract:

The Micro-Grain Array Processor (MGAP) is a family of two-dimensional, micro-grained array processors. The processor cell architecture is extremely compact and simple, ensuring fine grainess, a very high processor density, and programming flexibility. Flexibility is maintained through a programmable interconnect which clusters array cells into larger computational units. In this paper, we will discuss the design and optimization issues of MGAP-2, both at the processor array and system levels. Various design strategies and tradeoffs are being investigated at both levels. The reader will see how lessons learned from building and using MGAP-1 have been applied in this new design effort. We also describe our MGAP programming environment and an application example --- the two- dimensional discrete cosine transform, a powerful image compression tool.

300dpi TIFF Images of pages:

3219 3220 3221 3222

Acrobat PDF file of whole paper:

ic953219.pdf