Authors:
Uwe Meyer-Baese,
Julien Buros,
Wolfgang Trautmann,
Fred Taylor,
Page (NA) Paper number 2107
Abstract:
Field-Programmable Logic (FPL) is on the verge of revolutionizing digital
signal processing (DSP) in the manner that programmable DSP microprocessors
did nearly two decades ago. While FPL densities and performance have
steadily improved to the point where some DSP solutions can be integrated
into a single FPL chip, they still have limited the use in high-precision
high-bandwidth applications. In this paper it is shown that alternative
implementation strategies can be found which overcome the precision/bandwidth
barrier. The design of Daubechies length 4 and 8 filter is presented
to compare FPL and programmable DSP solutions.
Authors:
Chris H Dick,
Fred Harris,
Page (NA) Paper number 1195
Abstract:
This paper investigates an architectural option for constructing high
sample-rate narrow-band single rate and multi-rate filters using Xilinx
field programmable gate array (FPGA) technology. Sigma-delta modulation
encoding is applied to the input data in order to effect a reduction
in the precision of the arithmetic units in the filter. This is done
without compromising the signal integrity within the band of interest.
The implementation provides a significant savings in device logic resources
in comparison to other techniques that provide the same functionality.
The sigma-delta pre-processor is described and its implementation using
XC4000 FPGAs is reported. The architecture of the reduced precision
filter is presented and its FPGA realization described.
Authors:
Dongho Kim,
Gwangwoo Choe,
Page (NA) Paper number 2248
Abstract:
AMD 3DNow! Technology provides substantial speedup for Digital Signal
Processing applications. A set of DSP routines is vectorized with the
3DNow! technology. The simplicity of the vector unit makes it easier
to convert the conventional DSP programs into vector operations, thus
reduces the learning curve. The performance gain from typical DSP routines
such as FIR, IIR and FFT indicates that the speedup can reach up to
1.5 comparing to the conventional host-based signal processing units.
3D games and multimedia applications benefit from the technology. The
vectorization can be integrated into compilers for the ease of use
in increasing the performance of the signal processing applications.
Authors:
Kouhei Nadehara,
Takashi Miyazaki,
Ichiro Kuroda,
Page (NA) Paper number 2264
Abstract:
In this paper, a fast radix-4 complex FFT implementation using 4-parallel
SIMD instructions is presented. Four radix-4 butterflies are calculated
in parallel at all stages by loading consecutive 4 elements into a
register. At the last stage, every 4 elements is packed into a register
and calculated in parallel. This regular data flow enables higher parallelism
and an overhead reduction in data format conversion. The implementation
result on the V830R processor, which has a 4-parallel SIMD-type multimedia
instruction set, achieves practical performance quite competitive with
high-end parallel DSPs. Multiply-accumulate instructions with symmetrical
rounding introduced to the V830R processor are effective to maintain
FFT accuracy.
Authors:
Sanjay M Joshi, University of Maryland Baltimore County, Baltimore, MD, USA (USA)
Pradeep K Dubey, IBM Research Division, New Delhi, India (India)
Page (NA) Paper number 1477
Abstract:
The AltiVec technology is a SIMD (Single Instruction Multiple Data)
extension to PowerPC architecture. It is intended to provide architectural
support for performance improvement of various image and signal processing
applications, including speech processing, on a general-purpose processor
implementation, such as, the PowerPC line of processors. In this paper
we have implemented some of the common speech processing algorithms
on AltiVec architecture. The algorithms discussed in this paper are
autocorrelation computation, linear prediction coefficients computation
via Levinson-Durbin method and Schur recursion, and part of the GSM
speech compression system. AltiVec obtained significant speedups on
all these algorithms, compared to the scalar PowerPC implementation.
We also found that additional speedup was achievable by porting to
new, more SIMD-friendly algorithm.
Authors:
Jose Fridman,
William C Anderson,
Page (NA) Paper number 2317
Abstract:
This paper presents a new highly-parallel DSP architecture based on
a short-vector memory system developed at Analog Devices, Inc. This
DSP incorporates for the first time in an embedded processor a number
of techniques found in general-purpose computing, such as branch prediction,
deep and fully-interlocked pipeline, and SIMD instruction execution.
By means of its short-vector high-bandwidth memory system it is able
to deliver sustained performance that is close to its peak computational
rates of 1.5 GFLOPS (32-bit floating-point), or 6 BOPS (16-bit fixed-point).
Authors:
Justin G.R. Delva, Department of Electrical Engineering and Computer Science, University of Wisconsin-Milwaukee (U.K.)
Ali M Reza, Department of Electrical Engineering and Computer Science, University of Wisconsin-Milwaukee (U.K.)
Robert D Turney,
Page (NA) Paper number 2110
Abstract:
Nonlinear filtering has found many practical applications in digital
signal and image processing. The computation complexity of these filtering
algorithms make them difficult for real-time hardware implementation.
One of these nonlinear filters, which is based on fuzzy classification
of each pixel to subgroups of its neighboring pixels, is considered
here for hardware implementation. The criteria of this filter are based
on the local context which form the basis of the fuzzy rule. The filtering
algorithm is slightly modified for implementation into a Xilinx Virtex
series of FPGA for real-time processing of image sequences. Implementation
details and recommendations for further improvement are discussed.
Result of a simulation example from the proposed hardware implementation
is also presented.
|