Session: DISPS-P3
Time: 3:30 - 5:30, Friday, May 11, 2001
Location: Exhibit Hall Area 7
Title: Communications System Design/Cordic Processors
Chair: Naresh Shanbhag

3:30, DISPS-P3.1
A FPGA-BASED VITERBI ALGORITHM IMPLEMENTATION FOR SPEECH RECOGNITION SYSTEMS
F. VARGAS, R. FAGUNDES, D. JUNIOR
This work proposes a speech recognition system based on a hardware/software co-design implementation approach. The main advantage in this approach is an expressive processing time reduction in speech recognition, because part of the system is implemented by dedicated hardware. This work also discuss another way to implement “Hidden Markov Models” (HMM), a probabilistic model extensively used in speech recognition systems. In this new approach, the Viterbi algorithm, used to compute the HMM likelihood score, will be “built in” together with the HMM structure designed in Hardware, and implementing probabilistic state machines that will run as parallel processes each one for each word in the vocabulary handled by the system. So far, we have a dramatic speed up performance, getting meseaures around 500 times faster than a classic implementation with the correctness comparable with others isolated word recognition systems.

3:30, DISPS-P3.2
CONFIGURABLE HARDWARE IMPLEMENTATION OF TRIPLE-DES ENCRYPTION ALGORITHM FOR WIRELESS LOCAL AREA NETWORK
P. HÄMÄLÄINEN, M. HÄNNIKÄINEN, T. HÄMÄLÄINEN, J. SAARINEN
This paper presents three implementations of Triple Data Encryption Standard (3DES) algorithm on a configurable platform. Implementations are aimed at Medium Access Control (MAC) protocol of a multimedia-capable Wireless Local Area Network (WLAN). For this reason, very strict timing constraints as well as demands for area-efficiency are present. The MAC processing is handled by a Digital Signal Processor (DSP) and a Xilinx Virtex Field Programmable Gate Array (FPGA) chip. The latter one is also used for the presented encryption implementations. As a result of the study, 3DES implementations with small area and reasonable throughput and, on the contrary, with large area and very high throughput are realized. Even though 3DES turns out to be quite large and resource-demanding, the implementations still leave enough chip area for the other MAC functions. Consequently, the set requirements are met and the cipher can be integrated into the system.

3:30, DISPS-P3.3
CORDIC REALIZATION OF THE TRANSVERSAL ADAPTIVE FILTER USING A TRIGONOMETRIC LMS ALGORITHM
M. CHAKRABORTY, A. DHAR, S. PERVIN
This paper presents a class of pipelined CORDIC architectures for the LMS-based transversal adaptive filter. For this, an alternate formulation of the LMS algorithm is considered, obtained by expressing the mean square error as a convex function of a set of angle variables that are monotonically related to the filter tap weights. The proposed architectures employ microlevel pipelining and are adjustable to strike tradeoffs between throughput efficiency vis-a-vis hardware complexity.

3:30, DISPS-P3.4
A NOVEL TRELLIS-BASED SEARCHING SCHEME FOR EEAS-BASED CORDIC ALGORITHM
C. WU, A. WU
The CORDIC algorithm is a well-known iterative method for the computation of vector rotation. For applications that require forward rotation (or vector rotation) only, the Extended Elementary Angle Set (EEAS) Scheme provides a relaxed approach to speed up the operation of the CORDIC algorithm. When determining the parameters of EEAS-based CORDIC algorithm, two optimization problems are encountered. In the previous work, greedy algorithm is suggested to solve these optimization problems. However, for the application that requires high-precision rotation operation, the results generated by greedy algorithm may not be applicable. In this paper, we propose a novel searching algorithm to overcome the aforementioned problem, called Trellis-based Searching (TBS) algorithm. Compared with the greedy algorithm used in the conventional EEAS-based CORDIC algorithm, the proposed TBS algorithm yields apparent performance improvement. Moreover, derivation of error boundary as well as computer simulations are provided to support our arguments.

3:30, DISPS-P3.5
A UNIFIED DESIGN FRAMEWORK FOR VECTOR ROTATIONAL CORDIC FAMILY BASED ON ANGLE QUANTIZATION PROCESS
A. WU, C. WU
Vector rotation is the key operation employed extensively in many digital signal processing applications. In this paper, we introduce a new design concept called Angle Quantization (AQ). It can be used as a design index for vector rotational operation, where the rotational angle is known in advance. Based on the AQ process, we establish a unified design framework for cost-effective low-latency rotational algorithms and architectures. Several existing works, such as conventional CORDIC, AR-CORDIC, MVR-CORDIC, and EEAS-based CORDIC, can be fitted into the design framework, forming a Vector Rotational CORDIC Family. Based on the new design framework, we can realize high-speed/low-complexity rotational VLSI circuits, whereas without degrading the precision performance in fixed-point implementations.

3:30, DISPS-P3.6
HIGH-SPEED CORDIC IMPLEMENTATIONS USING ADVANCED CIRCUIT TECHNIQUES
G. JUNG, S. KIM, G. SOBELMAN
This paper presents results on using advanced domino circuit design techniques to implement a CORDIC processor. Skew-tolerant domino, enhanced precharged contention, non-blocking domino and pulsed reset domino circuit techniques are explained and applied to the implementation of this functional unit. For comparison purposes, a baseline design using standard two-phase domino with intermediate latches is also developed. Simulation results show that significant throughput improvement is possible using the advanced circuit techniques, with the pulsed reset style having the highest speed. On the other hand, these approaches result in increased power dissipation.

3:30, DISPS-P3.7
TIME-SHARED TMR FOR FAULT-TOLERANT CORDIC PROCESSORS
V. PIURI, J. KWAK, E. SWARTZLANDER
This paper presents a low-cost approach to concurrent error correction in high-performance CORDIC processors by using time-shared triple modular redundancy. Operands are partitioned in three sets of disjoint digits and operations are performed three times on different hardware components to correct possible errors by majority voting. The approach has limited latency increase and throughput reduction. Pipelining can be used to maintain the same throughput as a conventional design.

3:30, DISPS-P3.8
DECISION FEEDBACK EQUALIZER WITH TWO'S COMPLEMENT COMPUTATION SHARING MULTIPLICATION
H. CHOO, K. MUHAMMAD, K. ROY
We present an architecture of a high performance decision feedback equalizer based on a computation sharing multiplier. The computation sharing multiplier (CSHMR) uses a redundant number scheme and targets removal of computational redundancy by computation re-use. Use of CSHMR leads to high performance FIR filtering operation by re-using optimal precomputations. A decision feedback equalizer (DFE) implementation based on CSHMR in a 0.35u technology shows 34% improvement in the operating speed over DFE using Wallace tree multiplier.

3:30, DISPS-P3.9
DSP-BASED SIGNAL PROCESSING FOR OFDM TRANSMISSION
M. SCHOEBINGER, S. MEIER
A demonstrator for OFDM transmission based on a programmable DSP (TMS320C6201) is described. It turns out that the realized rather moderate sampling rates up to 10 Msamples/s still represent quite a challenge for state-of-the-art DSPs in terms of the required computational power but also the synchronization of the internal processing with the I/O interface to a real-time environment. It is illustrated that SW development under stringent resource constraints requires analysis and partitioning of the algorithms in a manner very similar to the mapping strategies necessary in an ASIC design for either cost-sensitive or extremely challenging applications. Therefore, the demonstrator development provides a sound basis for a subsequent design of such a kind of ASICs.

3:30, DISPS-P3.10
MATLAB BASED CODESIGN FRAMEWORK FOR WIRELESS BROADBAND COMMUNICATION DSPS
J. KNEIP, V. AUE, M. WEISS, M. BOLLE, G. FETTWEIS
We provide an overview of a novel MATLAB based Hardware-Software design flow that has been applied to the design of a platform based SoC for the HiperLAN/2 and IEEE 802.11a wideband wireless communication standards. Starting from a high-level algorithmic description, the MATLAB environment serves as a ``golden model'' and universal, cycle-true testbench for the embedded hard- and software implementation. A universal interface concept allows the exchange of modules with different abstraction levels in a cosimulation. This way, a high confidence level for the design verification is achieved, and both design and verification time is substantially reduced.