Home
 Mirror Sites
 General Information
 Confernce Schedule
 Technical Program
 Tutorials
 Industry Technology Tracks
 Exhibits
 Sponsors
 Registration
 Coming to Phoenix
 Call for Papers
 Author's Kit
 On-line Review
 Future Conferences
 Help
|
Abstract: Session ITT-1 |
|
ITT-1.1
|
Fast Implementation of Orthogonal Wavelet Filterbanks
Uwe Meyer-Baese,
Julien Buros,
Wolfgang Trautmann,
Fred Taylor (Dept. E&C Eng. HSDAL University of Florida)
Field-Programmable Logic (FPL) is on the verge of
revolutionizing digital signal processing (DSP) in the manner
that programmable DSP microprocessors did nearly two decades
ago. While FPL densities and performance have steadily improved
to the point where some DSP solutions can be integrated into a
single FPL chip, they still have limited the use in
high-precision high-bandwidth applications. In this paper it is
shown that alternative implementation strategies can be found
which overcome the precision/bandwidth barrier. The design of
Daubechies length 4 and 8 filter is presented to compare FPL and
programmable DSP solutions.
|
ITT-1.2
|
High-Performance FPGA Filters Using Sigma-Delta Modulation Encoding
Chris H Dick (Xilinx Inc., San Jose),
Fred Harris (College of Engineering, San Diego State University, San Diego)
This paper investigates an architectural option for
constructing high sample-rate narrow-band single rate
and multi-rate filters using Xilinx field programmable
gate array (FPGA) technology. Sigma-delta modulation
encoding is applied to the input data in order to
effect a reduction in the precision of the arithmetic
units in the filter. This is done without compromising
the signal integrity within the band of interest. The
implementation provides a significant savings in device
logic resources in comparison to other techniques that
provide the same functionality. The sigma-delta
pre-processor is described and its implementation using
XC4000 FPGAs is reported. The architecture of the
reduced precision filter is presented and its FPGA
realization described.
|
ITT-1.3
|
AMD 3DNow! Vectorization for Signal Processing Applications
Gwangwoo Choe,
Dongho Kim (Advanced Micro Devices)
AMD 3DNow! Technology provides substantial speedup for
Digital Signal Processing applications. A set of DSP routines is
vectorized with the 3DNow! technology. The simplicity of the
vector unit makes it easier to convert the conventional DSP
programs into vector operations, thus reduces the learning curve.
The performance gain from typical DSP routines such as FIR,
IIR and FFT indicates that the speedup can reach up to 1.5
comparing to the conventional host-based signal processing
units. 3D games and multimedia applications benefit from the
technology. The vectorization can be integrated into compilers
for the ease of use in increasing the performance of the signal
processing applications.
|
ITT-1.4
|
RADIX-4 FFT IMPLEMENTATION USING SIMD MULTIMEDIA INSTRUCTIONS
Kouhei Nadehara,
Takashi Miyazaki,
Ichiro Kuroda (NEC Corporation)
In this paper, a fast radix-4 complex FFT implementation using
4-parallel SIMD instructions is presented.
Four radix-4 butterflies are calculated in parallel at all stages by
loading consecutive 4 elements into a register.
At the last stage, every 4 elements is packed into a register
and calculated in parallel.
This regular data flow enables higher parallelism and
an overhead reduction in data format conversion.
The implementation result on the V830R processor,
which has a 4-parallel SIMD-type multimedia instruction set,
achieves practical performance quite competitive with high-end parallel DSPs.
Multiply-accumulate instructions with
symmetrical rounding introduced to the V830R processor are effective
to maintain FFT accuracy.
|
ITT-1.5
|
Some Fast Speech Processing Algorithms using AltiVec Technology
Sanjay M Joshi (University of Maryland Baltimore County, Baltimore, MD, USA),
Pradeep K Dubey (IBM Research Division, New Delhi, India)
The AltiVec technology is a SIMD (Single Instruction Multiple Data)
extension to PowerPC architecture. It is intended to provide architectural
support for performance improvement of various image and signal processing
applications, including speech processing, on a general-purpose processor
implementation, such as, the PowerPC line of processors. In this paper we
have implemented some of the common speech processing algorithms on
AltiVec architecture. The algorithms discussed in this paper are
autocorrelation computation, linear prediction coefficients computation
via Levinson-Durbin method and Schur recursion, and part of the GSM speech
compression system. AltiVec obtained significant speedups on all these
algorithms, compared to the scalar PowerPC implementation. We also found
that additional speedup was achievable by porting to new, more
SIMD-friendly algorithm.
|
ITT-1.6
|
A New Parallel DSP with Short-Vector Memory Architecture
Jose Fridman,
William C Anderson (Analog Devices, Inc.)
This paper presents a new highly-parallel DSP architecture based on a short-vector memory system developed at Analog Devices, Inc. This DSP incorporates for the first time in an embedded processor a number of techniques found in general-purpose computing, such as branch prediction, deep and fully-interlocked pipeline, and SIMD instruction execution. By means of its short-vector high-bandwidth memory system it is able to deliver sustained performance that is close to its peak computational rates of 1.5 GFLOPS (32-bit floating-point), or 6 BOPS (16-bit fixed-point).
|
ITT-1.7
|
FPGA Implementation of a Nonlinear Two Dimensional Fuzzy Filter
Justin G Delva,
Ali M Reza (Department of Electrical Engineering and Computer Science, University of Wisconsin-Milwaukee),
Robert D Turney (Xilinx Inc.)
Nonlinear filtering has found many practical
applications in digital signal and image processing.
The computation complexity of these filtering
algorithms make them difficult for real-time
hardware implementation. One of these nonlinear
filters, which is based on fuzzy classification of
each pixel to subgroups of its neighboring pixels, is
considered here for hardware implementation. The
criteria of this filter are based on the local context
which form the basis of the fuzzy rule. The filtering
algorithm is slightly modified for implementation
into a Xilinx Virtex series of FPGA for real-time
processing of image sequences. Implementation
details and recommendations for further
improvement are discussed. Result of a simulation
example from the proposed hardware
implementation is also presented.
|
|