Home
 Mirror Sites
 General Information
 Confernce Schedule
 Technical Program
 Tutorials
 Industry Technology Tracks
 Exhibits
 Sponsors
 Registration
 Coming to Phoenix
 Call for Papers
 Author's Kit
 On-line Review
 Future Conferences
 Help
|
Abstract: Session DISPS-2 |
|
DISPS-2.1
|
Hybrid Multiplier/CORDIC Unit for Online Handwriting Recognition
Stephen McInerney (DSP Group, Dept. of Electronic & Electrical Engineering, University College Dublin),
Richard B Reilly (Dept. of Electronic & Electrical Engineering, University College Dublin)
Traditionally Online Handwriting Recognition (OHR) implementations use general-purpose
processor architectures. The pre-processing step of OHR comprises regular array-based
tasks such as normalisation, feature extraction and segmentation. Standard processor
architectures cannot however efficiently support the varied arithmetic operations
required by pre-processing. These tasks would seem ideally suited for custom hardware
acceleration. CORDIC offers all the required elementary functions for pre-processing
but is inefficient for linear mode operations (multiplication/division) due to its
serial nature. A hybrid Multiplier/CORDIC architecture is proposed in which a fast
iterative multiplier/MAC shares hardware with a serial CORDIC unit. This multiplier
retires 6b/cycle with minor additional hardware requirements. This hybrid offers
improved general performance for signal-processing applications and is targeted at
the pre-processing task of OHR. Performance results are included.
http://wwwdsp.ucd.ie
|
DISPS-2.2
|
Low-Power Bit-Serial Viterbi Decoder for Next Generation Wide-Band CDMA Systems
Hiroshi Suzuki (Kawasaki Steel Corporation, LSI Division),
Yun-Nan Chang,
Keshab K Parhi (University of Minnesota, Minneapolis)
This paper presents a low-power bit-serial Viterbi
decoder chip with the coding rate r=1/3 and the
constraint length K=9 (256 states). This chip has
been implemented using 0.5um three-layer metal CMOS
technology and is targeted for high speed convolutional
decoding for next generation wireless applications
such as wide-band CDMA mobile systems and wireless
ATM LANs. The chip is expected to operate at 20Mbps
under 3.3V and at 2Mbps under 1.8V. The
Add-Compare-Select (ACS) units have been designed
using bit-serial arithmetic, which has made it
feasible to execute 256 ACS operations in parallel.
For trace-back operations, we have developed a novel
power-efficient trace-back scheme and an
application-specific memory, which was designed
considering that 256 bits should be written
simultaneously for write operations but only one bit
needs to be accessed for read operations. We have
estimated that the chip dissipates only 10mW at 2Mbps
operation under 1.8V.
|
DISPS-2.3
|
A Highly-scalable Symmetric/Asymmetric FIR Processor
Wei-Lung Liu,
Oscal T.-C. Chen,
Heng-Chou Chen (National Chung Cheng University),
Hsun-Chang Hsieh (Industrial Technology Research Institute)
Based on the radix-4 Booth algorithm, we developed a highly-scaleable symmetric/asymmetric finite impulse response (FIR) architecture which comprises a pre-processing unit, data latches, configurable connection units, double Booth decoders, coefficient registers, a path control unit, and a post-processing unit. In order to achieve scaleability, the configurable connection units between data latches and the double Booth decoders have been effectively addressed. The precision of filter coefficients is adjustable by using a path control unit. The double Booth decoding is efficiently implemented. Especially, the proposed architecture only employs data-path controls to accomplish the scaleable operations without changing word lengths and components of data latches and filter taps. A practical FIR processor, which can accommodate dynamic ranges of 8 and 16 bits of input data and filter coefficients, was implemented by using the COMPASS 5V cell library in the TSMC 0.6µ m CMOS technology. This processor supports ten different operation modes of asymmetric, symmetric, and anti-symmetric filter coefficients at 64, 63, 32, or 16 taps for various industrial applications.
|
DISPS-2.4
|
A NOVEL MEMORY-BASED FFT PROCESSOR FOR DMT/OFDM APPLICATIONS
Ching-Hsien Chang,
Chin-Liang Wang,
Yu-Tai Chang (Department of Electrical Engineering, National Tsing Hua University Hsinchu, Taiwan 300, Republic of China)
This paper presents a novel VLSI architecture for
computing the N-point discrete Fourier transform (DFT)
based on a radix-2 fast algorithm, where N is a power of
two. The architecture consists of one complex multiplier,
two complex adders, and some special memory units.
It can compute one transform sample every log2(N)+1
clock cycles in average. For the case of N=512, the
chip area required is about 5742um x5222 um and the
throughput is up to 4M transform samples per second
under 0.6mm CMOS technology. Such area-time performance
makes the proposed design rather attractive for use in
long-length DFT applications, such as ADSL and OFDM
systems.
|
DISPS-2.5
|
Synthesis of Array Architectures for Block Matching Motion Estimation:
Design Exploration using the tool DG2VHDL
John Bonk (Naval Systems, Electronic Design Laboratory , Raytheon),
Andrew Stone,
Elias S Manolakos (Communications and Digital Signal Processing (CDSP) Center for Research and Graduate Studies, Northeastern University)
In this paper we present a design case study using DG2VHDL, a tool
which bridges the
gap between an abstract graphical description of a DSP algorithm and
its concrete hardware description language (HDL) representation.
DG2VHDL automatically translates a Dependence Graph (DG)
into a synthesizable, behavioral VHDL entity that can be input to
industrial strength behavioral compilers for producing silicon
implementations of the algorithm (FPGAs, ASICs). Full Search Block Matching
Motion Estimation was selected for its current applications (MPEG,
HDTV, Video Conferencing) as well as for the richness of literature
and architectural exploration over the last decade. We will not only
demonstrate here that the behavioral VHDL code produced automatically
by the tool leads, after behavioral synthesis, to an efficient
distributed memory and control modular array architecture, but will also
provide comparative statistics for several new FS-BMA architectures
derived for real-time motion estimation.
|
DISPS-2.6
|
A High-Throughput, Low Power Architecture and Its VLSI Implementation for DFT/IDFT Computation
Wei-Ren Shiue,
Shen-Fu Hsiao (Inst. Compt. Eng., NSYSU, Taiwan)
A recursive algorithm for computation of both forward and backward DFT has been proposed where the common entries in the decomposed matrices are factored out in order to reduce the number of multipliers needed during implementation. The derived algorithm is essentially the band-matrix-vector multiplication with matrix bandwidth of 3. By exploiting the heterogeneous dependency graphs for the matrix-vector multiplication and using an efficient mapping technique, only logN adders and logN-1 multipliers are needed to compute the DFT of size N, a great saving from a recently proposed systolic architecture which calls for 3logN adders and 3logN multipliers. Furthermore, due to the simplicity and regularity of the architectures, it is possible to design low power processor by turning off the hardware components of no operation at proper time steps. VLSI implementation of the DFT/IDFT processor with distributed FSM for timing control is also presented.
|
DISPS-2.7
|
NOVEL MAPPING OF A LINEAR QR ARCHITECTURE
LIGHTBODY GAYE (QUEEN'S UNIVERSITY BELFAST, NORTHERN IRELAND),
RICHARD L WALKE (DEFENSE EVALUATION AND RESEARCH AGENCY, MALVERN, ENGLAND),
ROGER F WOODS,
JOHN V McCANNY (QUEEN'S UNIVERSITY BELFAST, NORTHERN IRELAND)
This paper presents a novel architecture mapping technique
which was essential in the design of a QR array which forms
the core processor of a single chip adaptive beamforming
system. The mapping technique assigns a QR triangular array
of 2m^2+3m+1 cells down onto a linear architecture of m+1
processors. The mapping results in a linear systolic
architecture with one hundred percent hardware utilisation,
local interconnects and individual processors for boundary
and internal cell operations. In addition, this paper
highlights the effect latency has on the validity of the
linear architecture.
|
DISPS-2.8
|
An Unrestrictedly Parallel Scheme for Ultra-High-Rate Reprogrammable Huffman Coding
Robert A Freking,
Keshab K Parhi (Dept. of Electrical and Computer Engineering, University of Minnesota)
This paper proposes a comprehensive method for overcoming the inherently serial nature of variable-length near-entropy coding to obtain unrestrictedly parallel realizations of Huffman compression. A codestream rearrangement technique together with a symbol-stream order-recovery procedure form a concurrent approach capable of exceeding all previously attainable coderate figures. Furthermore, the method is noteworthy for achieving 100% hardware utilization with no coderate overhead while maintaining data output in a traditional streamed format. To further this endeavor, bit-serial encoder and decoder designs that possess compelling speed and area advantages are developed for service as parallel processing elements. However, both are suitable in more general contexts as well. The decoder, in particular, is optimally fast. The encoder and decoder designs are programmable, thus suggesting the appropriateness of the composite approach for a general-purpose ultra-high-speed codec. Benefits for low-power and variable-rate applications are briefly discussed.
|
DISPS-2.9
|
FLEXIBLE VIDEO COMPRESSION SYSTEMS USING AN ANALOG VECTOR QUANTIZATION CHIP
Stefano Rovetta,
Rdolfo Zunino (DIBE - Genoa University)
Vector quantization systems are usually based on
digital implementation of the core operations.
In this paper, video compression systems exploiting
an analog implementation of vector quantization are
presented. The main advantages of analog design are
exploited, obtaining notable performances when
compared to other solutions found in the literature.
The circuit features a very modular, completely
parallel internal architecture. Many circuits can be
easily connected to obtain a larger codebook size and
a larger vector dimension. Synthesis of codebooks
is also described.
|
|