ICASSP99 VLSI Building Blocks

VLSI Building Blocks
Home Full List of Titles 1: Speech Processing CELP Coding Large Vocabulary Recognition Speech Analysis and Enhancement Acoustic Modeling I ASR Systems and Applications Topics in Speech Coding Speech Analysis Low Bit Rate Speech Coding I Robust Speech Recognition in Noisy Environments Speaker Recognition Acoustic Modeling II Speech Production and Synthesis Feature Extraction Robust Speech Recognition and Adaptation Low Bit Rate Speech Coding II Speech Understanding Language Modeling I 2: Speech Processing, Audio and Electroacoustics, and Neural Networks Acoustic Modeling III Lexical Issues/Search Speech Understanding and Systems Speech Analysis and Quantization Utterance Verification/Acoustic Modeling Language Modeling II Adaptation /Normalization Speech Enhancement Topics in Speaker and Language Recognition Echo Cancellation and Noise Control Coding Auditory Modeling, Hearing Aids and Applications of Signal Processing to Audio and Acoustics Spatial Audio Music Applications Application - Pattern Recognition & Speech Processing Theory & Neural Architecture Signal Separation Application - Image & Nonlinear Signal Processing 3: Signal Processing Theory & Methods I Filter Design and Structures Detection Wavelets Adaptive Filtering: Applications and Implementation Nonlinear Signals and Systems Time/Frequency and Time/Scale Analysis Signal Modeling and Representation Filterbank and Wavelet Applications Source and Signal Separation Filterbanks Emerging Applications and Fast Algorithms Frequency and Phase Estimation Spectral Analysis and Higher Order Statistics Signal Reconstruction Adaptive Filter Analysis Transforms and Statistical Estimation Markov and Bayesian Estimation and Classification 4: Signal Processing Theory & Methods II, Design and Implementation of Signal Processing Systems, Special Sessions, and Industry Technology Tracks System Identification, Equalization, and Noise Suppression Parameter Estimation Adaptive Filters: Algorithms and Performance DSP Development Tools VLSI Building Blocks DSP Architectures DSP System Design Education Recent Advances in Sampling Theory and Applications Steganography: Information Embedding, Digital Watermarking, and Data Hiding Speech Under Stress Physics-Based Signal Processing DSP Chips, Architectures and Implementations DSP Tools and Rapid Prototyping Communication Technologies Image and Video Technologies Automotive Applications / Industrial Signal Processing Speech and Audio Technologies Defense and Security Applications Biomedical Applications Voice and Media Processing Adaptive Interference Cancellation 5: Communications, Sensor Array and Multichannel Source Coding and Compression Compression and Modulation Channel Estimation and Equalization Blind Multiuser Communications Signal Processing for Communications I CDMA and Space-Time Processing Time-Varying Channels and Self-Recovering Receivers Signal Processing for Communications II Blind CDMA and Multi-Channel Equalization Multicarrier Communications Detection, Classification, Localization, and Tracking Radar and Sonar Signal Processing Array Processing: Direction Finding Array Processing Applications I Blind Identification, Separation, and Equalization Antenna Arrays for Communications Array Processing Applications II 6: Multimedia Signal Processing, Image and Multidimensional Signal Processing, Digital Signal Processing Education Multimedia Analysis and Retrieval Audio and Video Processing for Multimedia Applications Advanced Techniques in Multimedia Video Compression and Processing Image Coding Transform Techniques Restoration and Estimation Image Analysis Object Identification and Tracking Motion Estimation Medical Imaging Image and Multidimensional Signal Processing Applications I Segmentation Image and Multidimensional Signal Processing Applications II Facial Recognition and Analysis Digital Signal Processing Education Author Index A B C D E F G H I J K L M N O P Q R S T U V W X Y Z	Hybrid Multiplier/CORDIC Unit for Online Handwriting Recognition Authors: Stephen McInerney, Richard B Reilly, Page (NA) Paper number 1470 Abstract: Traditionally Online Handwriting Recognition (OHR) implementations use general-purpose processor architectures. The pre-processing step of OHR comprises regular array-based tasks such as normalisation, feature extraction and segmentation. Standard processor architectures cannot however efficiently support the varied arithmetic operations required by pre-processing. These tasks would seem ideally suited for custom hardware acceleration. CORDIC offers all the required elementary functions for pre-processing but is inefficient for linear mode operations (multiplication/division) due to its serial nature. A hybrid Multiplier/CORDIC architecture is proposed in which a fast iterative multiplier/MAC shares hardware with a serial CORDIC unit. This multiplier retires 6b/cycle with minor additional hardware requirements. This hybrid offers improved general performance for signal-processing applications and is targeted at the pre-processing task of OHR. Performance results are included. http://wwwdsp.ucd.ie IC991470.PDF (From Author) IC991470.PDF (Rasterized) TOP Low-Power Bit-Serial Viterbi Decoder for Next Generation Wide-Band CDMA Systems Authors: Hiroshi Suzuki, Yun-Nan Chang, Keshab K Parhi, Page (NA) Paper number 1788 Abstract: This paper presents a low-power bit-serial Viterbi decoder chip with the coding rate r=1/3 and the constraint length K=9 (256 states). This chip has been implemented using 0.5um three-layer metal CMOS technology and is targeted for high speed convolutional decoding for next generation wireless applications such as wide-band CDMA mobile systems and wireless ATM LANs. The chip is expected to operate at 20Mbps under 3.3V and at 2Mbps under 1.8V. The Add-Compare-Select (ACS) units have been designed using bit-serial arithmetic, which has made it feasible to execute 256 ACS operations in parallel. For trace-back operations, we have developed a novel power-efficient trace-back scheme and an application-specific memory, which was designed considering that 256 bits should be written simultaneously for write operations but only one bit needs to be accessed for read operations. We have estimated that the chip dissipates only 10mW at 2Mbps operation under 1.8V. IC991788.PDF (From Author) IC991788.PDF (Rasterized) TOP A Highly-scalable Symmetric/Asymmetric FIR Processor Authors: Wei-Lung Liu, Oscal T.-C. Chen, Page (NA) Paper number 5092 Abstract: Based on the radix-4 Booth algorithm, we developed a highly-scaleable symmetric/asymmetric finite impulse response (FIR) architecture which comprises a pre-processing unit, data latches, configurable connection units, double Booth decoders, coefficient registers, a path control unit, and a post-processing unit. In order to achieve scaleability, the configurable connection units between data latches and the double Booth decoders have been effectively addressed. The precision of filter coefficients is adjustable by using a path control unit. The double Booth decoding is efficiently implemented. Especially, the proposed architecture only employs data-path controls to accomplish the scaleable operations without changing word lengths and components of data latches and filter taps. A practical FIR processor, which can accommodate dynamic ranges of 8 and 16 bits of input data and filter coefficients, was implemented by using the COMPASS 5V cell library in the TSMC 0.6µm CMOS technology. This processor supports ten different operation modes of asymmetric, symmetric, and anti-symmetric filter coefficients at 64, 63, 32, or 16 taps for various industrial applications. IC995092.PDF (From Author) IC995092.PDF (Rasterized) TOP A Novel Memory-Based FFT Processor For DMT/OFDM Applications Authors: Ching-Hsien Chang, Department of Electrical Engineering, National Tsing Hua University Hsinchu, Taiwan 300, Republic of China (China) Chin-Liang Wang, Department of Electrical Engineering, National Tsing Hua University Hsinchu, Taiwan 300, Republic of China (China) Yu-Tai Chang, Department of Electrical Engineering, National Tsing Hua University Hsinchu, Taiwan 300, Republic of China (China) Page (NA) Paper number 1505 Abstract: This paper presents a novel VLSI architecture for computing the N-point discrete Fourier transform (DFT) based on a radix-2 fast algorithm, where N is a power of two. The architecture consists of one complex multiplier, two complex adders, and some special memory units. It can compute one transform sample every log2(N)+1 clock cycles in average. For the case of N=512, the chip area required is about 5742um x5222 um and the throughput is up to 4M transform samples per second under 0.6mm CMOS technology. Such area-time performance makes the proposed design rather attractive for use in long-length DFT applications, such as ADSL and OFDM systems. IC991505.PDF (From Author) IC991505.PDF (Rasterized) TOP Synthesis Of Array Architectures For Block Matching Motion Estimation: Design Exploration Using The Tool DG2VHDL Authors: John Bonk, Andrew Stone, Elias S Manolakos, Page (NA) Paper number 2210 Abstract: In this paper we present a design case study using DG2VHDL, a tool which bridges the gap between an abstract graphical description of a DSP algorithm and its concrete hardware description language (HDL) representation. DG2VHDL automatically translates a Dependence Graph (DG) into a synthesizable, behavioral VHDL entity that can be input to industrial strength behavioral compilers for producing silicon implementations of the algorithm (FPGAs, ASICs). Full Search Block Matching Motion Estimation was selected for its current applications (MPEG, HDTV, Video Conferencing) as well as for the richness of literature and architectural exploration over the last decade. We will not only demonstrate here that the behavioral VHDL code produced automatically by the tool leads, after behavioral synthesis, to an efficient distributed memory and control modular array architecture, but will also provide comparative statistics for several new FS-BMA architectures derived for real-time motion estimation. IC992210.PDF (From Author) IC992210.PDF (Rasterized) TOP A High-Throughput, Low Power Architecture and Its VLSI Implementation for DFT/IDFT Computation Authors: Shen-Fu Hsiao, Inst. Compt. Eng., NSYSU, Taiwan (Taiwan) Wei-Ren Shiue, Inst. Compt. Eng., NSYSU, Taiwan (Taiwan) Page (NA) Paper number 1673 Abstract: A recursive algorithm for computation of both forward and backward DFT has been proposed where the common entries in the decomposed matrices are factored out in order to reduce the number of multipliers needed during implementation. The derived algorithm is essentially the band-matrix-vector multiplication with matrix bandwidth of 3. By exploiting the heterogeneous dependency graphs for the matrix-vector multiplication and using an efficient mapping technique, only logN adders and logN-1 multipliers are needed to compute the DFT of size N, a great saving from a recently proposed systolic architecture which calls for 3logN adders and 3logN multipliers. Furthermore, due to the simplicity and regularity of the architectures, it is possible to design low power processor by turning off the hardware components of no operation at proper time steps. VLSI implementation of the DFT/IDFT processor with distributed FSM for timing control is also presented. IC991673.PDF (Scanned) TOP Novel Mapping Of A Linear QR Architecture Authors: Gaye Lightbody, QUEEN'S UNIVERSITY BELFAST, NORTHERN IRELAND (Ireland) Richard L. Walke, DEFENSE EVALUATION AND RESEARCH AGENCY, MALVERN, ENGLAND (U.K.) Roger F. Woods, QUEEN'S UNIVERSITY BELFAST, NORTHERN IRELAND (Ireland) John V. McCanny, QUEEN'S UNIVERSITY BELFAST, NORTHERN IRELAND (Ireland) Page (NA) Paper number 2357 Abstract: This paper presents a novel architecture mapping technique which was essential in the design of a QR array which forms the core processor of a single chip adaptive beamforming system. The mapping technique assigns a QR triangular array of 2m^2+3m+1 cells down onto a linear architecture of m+1 processors. The mapping results in a linear systolic architecture with one hundred percent hardware utilisation, local interconnects and individual processors for boundary and internal cell operations. In addition, this paper highlights the effect latency has on the validity of the linear architecture. IC992357.PDF (From Author) IC992357.PDF (Rasterized) TOP An Unrestrictedly Parallel Scheme for Ultra-High-Rate Reprogrammable Huffman Coding Authors: Robert A Freking, Keshab K Parhi, Page (NA) Paper number 2100 Abstract: This paper proposes a comprehensive method for overcoming the inherently serial nature of variable-length near-entropy coding to obtain unrestrictedly parallel realizations of Huffman compression. A codestream rearrangement technique together with a symbol-stream order-recovery procedure form a concurrent approach capable of exceeding all previously attainable coderate figures. Furthermore, the method is noteworthy for achieving 100% hardware utilization with no coderate overhead while maintaining data output in a traditional streamed format. To further this endeavor, bit-serial encoder and decoder designs that possess compelling speed and area advantages are developed for service as parallel processing elements. However, both are suitable in more general contexts as well. The decoder, in particular, is optimally fast. The encoder and decoder designs are programmable, thus suggesting the appropriateness of the composite approach for a general-purpose ultra-high-speed codec. Benefits for low-power and variable-rate applications are briefly discussed. IC992100.PDF (From Author) IC992100.PDF (Rasterized) TOP Flexible Video Compression Systems Using An Analog Vector Quantization Chip Authors: Stefano Rovetta, Rodolfo Zunino, Page (NA) Paper number 1394 Abstract: Vector quantization systems are usually based on digital implementation of the core operations. In this paper, video compression systems exploiting an analog implementation of vector quantization are presented. The main advantages of analog design are exploited, obtaining notable performances when compared to other solutions found in the literature. The circuit features a very modular, completely parallel internal architecture. Many circuits can be easily connected to obtain a larger codebook size and a larger vector dimension. Synthesis of codebooks is also described. IC991394.PDF (From Author) IC991394.PDF (Rasterized) TOP