Home

Mirror Sites

General Information

Confernce Schedule

Technical Program


	Plenary Sessions

	Special Sessions

	Expert Summaries

	Tutorials

	Industry Technology Tracks


	Technical Sessions

Tutorials

Industry Technology Tracks

Exhibits

Sponsors

Registration

Coming to Phoenix

Call for Papers

Author's Kit

On-line Review

Future Conferences

Help

Abstract: Session ITT-1

ITT-1.1

Fast Implementation of Orthogonal Wavelet Filterbanks
Uwe Meyer-Baese, Julien Buros, Wolfgang Trautmann, Fred Taylor (Dept. E&C Eng. HSDAL University of Florida)

Field-Programmable Logic (FPL) is on the verge of revolutionizing digital signal processing (DSP) in the manner that programmable DSP microprocessors did nearly two decades ago. While FPL densities and performance have steadily improved to the point where some DSP solutions can be integrated into a single FPL chip, they still have limited the use in high-precision high-bandwidth applications. In this paper it is shown that alternative implementation strategies can be found which overcome the precision/bandwidth barrier. The design of Daubechies length 4 and 8 filter is presented to compare FPL and programmable DSP solutions.

ITT-1.2

High-Performance FPGA Filters Using Sigma-Delta Modulation Encoding
Chris H Dick (Xilinx Inc., San Jose), Fred Harris (College of Engineering, San Diego State University, San Diego)

This paper investigates an architectural option for constructing high sample-rate narrow-band single rate and multi-rate filters using Xilinx field programmable gate array (FPGA) technology. Sigma-delta modulation encoding is applied to the input data in order to effect a reduction in the precision of the arithmetic units in the filter. This is done without compromising the signal integrity within the band of interest. The implementation provides a significant savings in device logic resources in comparison to other techniques that provide the same functionality. The sigma-delta pre-processor is described and its implementation using XC4000 FPGAs is reported. The architecture of the reduced precision filter is presented and its FPGA realization described.

ITT-1.3

AMD 3DNow! Vectorization for Signal Processing Applications
Gwangwoo Choe, Dongho Kim (Advanced Micro Devices)

AMD 3DNow! Technology provides substantial speedup for Digital Signal Processing applications. A set of DSP routines is vectorized with the 3DNow! technology. The simplicity of the vector unit makes it easier to convert the conventional DSP programs into vector operations, thus reduces the learning curve. The performance gain from typical DSP routines such as FIR, IIR and FFT indicates that the speedup can reach up to 1.5 comparing to the conventional host-based signal processing units. 3D games and multimedia applications benefit from the technology. The vectorization can be integrated into compilers for the ease of use in increasing the performance of the signal processing applications.

ITT-1.4

RADIX-4 FFT IMPLEMENTATION USING SIMD MULTIMEDIA INSTRUCTIONS
Kouhei Nadehara, Takashi Miyazaki, Ichiro Kuroda (NEC Corporation)

In this paper, a fast radix-4 complex FFT implementation using 4-parallel SIMD instructions is presented. Four radix-4 butterflies are calculated in parallel at all stages by loading consecutive 4 elements into a register. At the last stage, every 4 elements is packed into a register and calculated in parallel. This regular data flow enables higher parallelism and an overhead reduction in data format conversion. The implementation result on the V830R processor, which has a 4-parallel SIMD-type multimedia instruction set, achieves practical performance quite competitive with high-end parallel DSPs. Multiply-accumulate instructions with symmetrical rounding introduced to the V830R processor are effective to maintain FFT accuracy.

ITT-1.5

Some Fast Speech Processing Algorithms using AltiVec Technology
Sanjay M Joshi (University of Maryland Baltimore County, Baltimore, MD, USA), Pradeep K Dubey (IBM Research Division, New Delhi, India)

The AltiVec technology is a SIMD (Single Instruction Multiple Data) extension to PowerPC architecture. It is intended to provide architectural support for performance improvement of various image and signal processing applications, including speech processing, on a general-purpose processor implementation, such as, the PowerPC line of processors. In this paper we have implemented some of the common speech processing algorithms on AltiVec architecture. The algorithms discussed in this paper are autocorrelation computation, linear prediction coefficients computation via Levinson-Durbin method and Schur recursion, and part of the GSM speech compression system. AltiVec obtained significant speedups on all these algorithms, compared to the scalar PowerPC implementation. We also found that additional speedup was achievable by porting to new, more SIMD-friendly algorithm.

ITT-1.6

A New Parallel DSP with Short-Vector Memory Architecture
Jose Fridman, William C Anderson (Analog Devices, Inc.)

This paper presents a new highly-parallel DSP architecture based on a short-vector memory system developed at Analog Devices, Inc. This DSP incorporates for the first time in an embedded processor a number of techniques found in general-purpose computing, such as branch prediction, deep and fully-interlocked pipeline, and SIMD instruction execution. By means of its short-vector high-bandwidth memory system it is able to deliver sustained performance that is close to its peak computational rates of 1.5 GFLOPS (32-bit floating-point), or 6 BOPS (16-bit fixed-point).

ITT-1.7

FPGA Implementation of a Nonlinear Two Dimensional Fuzzy Filter
Justin G Delva, Ali M Reza (Department of Electrical Engineering and Computer Science, University of Wisconsin-Milwaukee), Robert D Turney (Xilinx Inc.)

Nonlinear filtering has found many practical applications in digital signal and image processing. The computation complexity of these filtering algorithms make them difficult for real-time hardware implementation. One of these nonlinear filters, which is based on fuzzy classification of each pixel to subgroups of its neighboring pixels, is considered here for hardware implementation. The criteria of this filter are based on the local context which form the basis of the fuzzy rule. The filtering algorithm is slightly modified for implementation into a Xilinx Virtex series of FPGA for real-time processing of image sequences. Implementation details and recommendations for further improvement are discussed. Result of a simulation example from the proposed hardware implementation is also presented.

ITT-2 >

Last Update: February 4, 1999 Ingo Höntsch