VLSI Building Blocks

Home
Full List of Titles
1: Speech Processing
CELP Coding
Large Vocabulary Recognition
Speech Analysis and Enhancement
Acoustic Modeling I
ASR Systems and Applications
Topics in Speech Coding
Speech Analysis
Low Bit Rate Speech Coding I
Robust Speech Recognition in Noisy Environments
Speaker Recognition
Acoustic Modeling II
Speech Production and Synthesis
Feature Extraction
Robust Speech Recognition and Adaptation
Low Bit Rate Speech Coding II
Speech Understanding
Language Modeling I
2: Speech Processing, Audio and Electroacoustics, and Neural Networks
Acoustic Modeling III
Lexical Issues/Search
Speech Understanding and Systems
Speech Analysis and Quantization
Utterance Verification/Acoustic Modeling
Language Modeling II
Adaptation /Normalization
Speech Enhancement
Topics in Speaker and Language Recognition
Echo Cancellation and Noise Control
Coding
Auditory Modeling, Hearing Aids and Applications of Signal Processing to Audio and Acoustics
Spatial Audio
Music Applications
Application - Pattern Recognition & Speech Processing
Theory & Neural Architecture
Signal Separation
Application - Image & Nonlinear Signal Processing
3: Signal Processing Theory & Methods I
Filter Design and Structures
Detection
Wavelets
Adaptive Filtering: Applications and Implementation
Nonlinear Signals and Systems
Time/Frequency and Time/Scale Analysis
Signal Modeling and Representation
Filterbank and Wavelet Applications
Source and Signal Separation
Filterbanks
Emerging Applications and Fast Algorithms
Frequency and Phase Estimation
Spectral Analysis and Higher Order Statistics
Signal Reconstruction
Adaptive Filter Analysis
Transforms and Statistical Estimation
Markov and Bayesian Estimation and Classification
4: Signal Processing Theory & Methods II, Design and Implementation of Signal Processing Systems, Special Sessions, and Industry Technology Tracks
System Identification, Equalization, and Noise Suppression
Parameter Estimation
Adaptive Filters: Algorithms and Performance
DSP Development Tools
VLSI Building Blocks
DSP Architectures
DSP System Design
Education
Recent Advances in Sampling Theory and Applications
Steganography: Information Embedding, Digital Watermarking, and Data Hiding
Speech Under Stress
Physics-Based Signal Processing
DSP Chips, Architectures and Implementations
DSP Tools and Rapid Prototyping
Communication Technologies
Image and Video Technologies
Automotive Applications / Industrial Signal Processing
Speech and Audio Technologies
Defense and Security Applications
Biomedical Applications
Voice and Media Processing
Adaptive Interference Cancellation
5: Communications, Sensor Array and Multichannel
Source Coding and Compression
Compression and Modulation
Channel Estimation and Equalization
Blind Multiuser Communications
Signal Processing for Communications I
CDMA and Space-Time Processing
Time-Varying Channels and Self-Recovering Receivers
Signal Processing for Communications II
Blind CDMA and Multi-Channel Equalization
Multicarrier Communications
Detection, Classification, Localization, and Tracking
Radar and Sonar Signal Processing
Array Processing: Direction Finding
Array Processing Applications I
Blind Identification, Separation, and Equalization
Antenna Arrays for Communications
Array Processing Applications II
6: Multimedia Signal Processing, Image and Multidimensional Signal Processing, Digital Signal Processing Education
Multimedia Analysis and Retrieval
Audio and Video Processing for Multimedia Applications
Advanced Techniques in Multimedia
Video Compression and Processing
Image Coding
Transform Techniques
Restoration and Estimation
Image Analysis
Object Identification and Tracking
Motion Estimation
Medical Imaging
Image and Multidimensional Signal Processing Applications I
Segmentation
Image and Multidimensional Signal Processing Applications II
Facial Recognition and Analysis
Digital Signal Processing Education

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Hybrid Multiplier/CORDIC Unit for Online Handwriting Recognition

Authors:

Stephen McInerney,
Richard B Reilly,

Page (NA) Paper number 1470

Abstract:

Traditionally Online Handwriting Recognition (OHR) implementations use general-purpose processor architectures. The pre-processing step of OHR comprises regular array-based tasks such as normalisation, feature extraction and segmentation. Standard processor architectures cannot however efficiently support the varied arithmetic operations required by pre-processing. These tasks would seem ideally suited for custom hardware acceleration. CORDIC offers all the required elementary functions for pre-processing but is inefficient for linear mode operations (multiplication/division) due to its serial nature. A hybrid Multiplier/CORDIC architecture is proposed in which a fast iterative multiplier/MAC shares hardware with a serial CORDIC unit. This multiplier retires 6b/cycle with minor additional hardware requirements. This hybrid offers improved general performance for signal-processing applications and is targeted at the pre-processing task of OHR. Performance results are included. http://wwwdsp.ucd.ie

IC991470.PDF (From Author) IC991470.PDF (Rasterized)

TOP


Low-Power Bit-Serial Viterbi Decoder for Next Generation Wide-Band CDMA Systems

Authors:

Hiroshi Suzuki,
Yun-Nan Chang,
Keshab K Parhi,

Page (NA) Paper number 1788

Abstract:

This paper presents a low-power bit-serial Viterbi decoder chip with the coding rate r=1/3 and the constraint length K=9 (256 states). This chip has been implemented using 0.5um three-layer metal CMOS technology and is targeted for high speed convolutional decoding for next generation wireless applications such as wide-band CDMA mobile systems and wireless ATM LANs. The chip is expected to operate at 20Mbps under 3.3V and at 2Mbps under 1.8V. The Add-Compare-Select (ACS) units have been designed using bit-serial arithmetic, which has made it feasible to execute 256 ACS operations in parallel. For trace-back operations, we have developed a novel power-efficient trace-back scheme and an application-specific memory, which was designed considering that 256 bits should be written simultaneously for write operations but only one bit needs to be accessed for read operations. We have estimated that the chip dissipates only 10mW at 2Mbps operation under 1.8V.

IC991788.PDF (From Author) IC991788.PDF (Rasterized)

TOP


A Highly-scalable Symmetric/Asymmetric FIR Processor

Authors:

Wei-Lung Liu,
Oscal T.-C. Chen,

Page (NA) Paper number 5092

Abstract:

Based on the radix-4 Booth algorithm, we developed a highly-scaleable symmetric/asymmetric finite impulse response (FIR) architecture which comprises a pre-processing unit, data latches, configurable connection units, double Booth decoders, coefficient registers, a path control unit, and a post-processing unit. In order to achieve scaleability, the configurable connection units between data latches and the double Booth decoders have been effectively addressed. The precision of filter coefficients is adjustable by using a path control unit. The double Booth decoding is efficiently implemented. Especially, the proposed architecture only employs data-path controls to accomplish the scaleable operations without changing word lengths and components of data latches and filter taps. A practical FIR processor, which can accommodate dynamic ranges of 8 and 16 bits of input data and filter coefficients, was implemented by using the COMPASS 5V cell library in the TSMC 0.6µm CMOS technology. This processor supports ten different operation modes of asymmetric, symmetric, and anti-symmetric filter coefficients at 64, 63, 32, or 16 taps for various industrial applications.

IC995092.PDF (From Author) IC995092.PDF (Rasterized)

TOP


A Novel Memory-Based FFT Processor For DMT/OFDM Applications

Authors:

Ching-Hsien Chang, Department of Electrical Engineering, National Tsing Hua University Hsinchu, Taiwan 300, Republic of China (China)
Chin-Liang Wang, Department of Electrical Engineering, National Tsing Hua University Hsinchu, Taiwan 300, Republic of China (China)
Yu-Tai Chang, Department of Electrical Engineering, National Tsing Hua University Hsinchu, Taiwan 300, Republic of China (China)

Page (NA) Paper number 1505

Abstract:

This paper presents a novel VLSI architecture for computing the N-point discrete Fourier transform (DFT) based on a radix-2 fast algorithm, where N is a power of two. The architecture consists of one complex multiplier, two complex adders, and some special memory units. It can compute one transform sample every log2(N)+1 clock cycles in average. For the case of N=512, the chip area required is about 5742um x5222 um and the throughput is up to 4M transform samples per second under 0.6mm CMOS technology. Such area-time performance makes the proposed design rather attractive for use in long-length DFT applications, such as ADSL and OFDM systems.

IC991505.PDF (From Author) IC991505.PDF (Rasterized)

TOP


Synthesis Of Array Architectures For Block Matching Motion Estimation: Design Exploration Using The Tool DG2VHDL

Authors:

John Bonk,
Andrew Stone,
Elias S Manolakos,

Page (NA) Paper number 2210

Abstract:

In this paper we present a design case study using DG2VHDL, a tool which bridges the gap between an abstract graphical description of a DSP algorithm and its concrete hardware description language (HDL) representation. DG2VHDL automatically translates a Dependence Graph (DG) into a synthesizable, behavioral VHDL entity that can be input to industrial strength behavioral compilers for producing silicon implementations of the algorithm (FPGAs, ASICs). Full Search Block Matching Motion Estimation was selected for its current applications (MPEG, HDTV, Video Conferencing) as well as for the richness of literature and architectural exploration over the last decade. We will not only demonstrate here that the behavioral VHDL code produced automatically by the tool leads, after behavioral synthesis, to an efficient distributed memory and control modular array architecture, but will also provide comparative statistics for several new FS-BMA architectures derived for real-time motion estimation.

IC992210.PDF (From Author) IC992210.PDF (Rasterized)

TOP


A High-Throughput, Low Power Architecture and Its VLSI Implementation for DFT/IDFT Computation

Authors:

Shen-Fu Hsiao, Inst. Compt. Eng., NSYSU, Taiwan (Taiwan)
Wei-Ren Shiue, Inst. Compt. Eng., NSYSU, Taiwan (Taiwan)

Page (NA) Paper number 1673

Abstract:

A recursive algorithm for computation of both forward and backward DFT has been proposed where the common entries in the decomposed matrices are factored out in order to reduce the number of multipliers needed during implementation. The derived algorithm is essentially the band-matrix-vector multiplication with matrix bandwidth of 3. By exploiting the heterogeneous dependency graphs for the matrix-vector multiplication and using an efficient mapping technique, only logN adders and logN-1 multipliers are needed to compute the DFT of size N, a great saving from a recently proposed systolic architecture which calls for 3logN adders and 3logN multipliers. Furthermore, due to the simplicity and regularity of the architectures, it is possible to design low power processor by turning off the hardware components of no operation at proper time steps. VLSI implementation of the DFT/IDFT processor with distributed FSM for timing control is also presented.

IC991673.PDF (Scanned)

TOP


Novel Mapping Of A Linear QR Architecture

Authors:

Gaye Lightbody, QUEEN'S UNIVERSITY BELFAST, NORTHERN IRELAND (Ireland)
Richard L. Walke, DEFENSE EVALUATION AND RESEARCH AGENCY, MALVERN, ENGLAND (U.K.)
Roger F. Woods, QUEEN'S UNIVERSITY BELFAST, NORTHERN IRELAND (Ireland)
John V. McCanny, QUEEN'S UNIVERSITY BELFAST, NORTHERN IRELAND (Ireland)

Page (NA) Paper number 2357

Abstract:

This paper presents a novel architecture mapping technique which was essential in the design of a QR array which forms the core processor of a single chip adaptive beamforming system. The mapping technique assigns a QR triangular array of 2m^2+3m+1 cells down onto a linear architecture of m+1 processors. The mapping results in a linear systolic architecture with one hundred percent hardware utilisation, local interconnects and individual processors for boundary and internal cell operations. In addition, this paper highlights the effect latency has on the validity of the linear architecture.

IC992357.PDF (From Author) IC992357.PDF (Rasterized)

TOP


An Unrestrictedly Parallel Scheme for Ultra-High-Rate Reprogrammable Huffman Coding

Authors:

Robert A Freking,
Keshab K Parhi,

Page (NA) Paper number 2100

Abstract:

This paper proposes a comprehensive method for overcoming the inherently serial nature of variable-length near-entropy coding to obtain unrestrictedly parallel realizations of Huffman compression. A codestream rearrangement technique together with a symbol-stream order-recovery procedure form a concurrent approach capable of exceeding all previously attainable coderate figures. Furthermore, the method is noteworthy for achieving 100% hardware utilization with no coderate overhead while maintaining data output in a traditional streamed format. To further this endeavor, bit-serial encoder and decoder designs that possess compelling speed and area advantages are developed for service as parallel processing elements. However, both are suitable in more general contexts as well. The decoder, in particular, is optimally fast. The encoder and decoder designs are programmable, thus suggesting the appropriateness of the composite approach for a general-purpose ultra-high-speed codec. Benefits for low-power and variable-rate applications are briefly discussed.

IC992100.PDF (From Author) IC992100.PDF (Rasterized)

TOP


Flexible Video Compression Systems Using An Analog Vector Quantization Chip

Authors:

Stefano Rovetta,
Rodolfo Zunino,

Page (NA) Paper number 1394

Abstract:

Vector quantization systems are usually based on digital implementation of the core operations. In this paper, video compression systems exploiting an analog implementation of vector quantization are presented. The main advantages of analog design are exploited, obtaining notable performances when compared to other solutions found in the literature. The circuit features a very modular, completely parallel internal architecture. Many circuits can be easily connected to obtain a larger codebook size and a larger vector dimension. Synthesis of codebooks is also described.

IC991394.PDF (From Author) IC991394.PDF (Rasterized)

TOP