Home
 Mirror Sites
 General Information
 Confernce Schedule
 Technical Program
 Tutorials
 Industry Technology Tracks
 Exhibits
 Sponsors
 Registration
 Coming to Phoenix
 Call for Papers
 Author's Kit
 On-line Review
 Future Conferences
 Help
|
Abstract: Session DISPS-4 |
|
DISPS-4.1
|
Low-Power DV Encoder Architecture for Digital CMOS Camcorder
Jeff Y Hsieh,
Teresa H Meng (Stanford University)
A low-power, large-scale parallel digital video (DV) [1] encoder
architecture for a single-chip digital CMOS video camera is
discussed in this paper. This architecture is based on the single chip
CMOS camera MPEG-2 encoder architecture proposed in [2]
with an emphasis on formatting and streaming of the compressed
data. The architecture proposed here supports the 625/25 format
of 720x576 pixels per frame. When clocked at 40 MHz, this
architecture delivers a processing performance of 1.8 billion
operations per second (BOPS) capable of supporting a frame rate of
25 fps as well as additional image enhancement processing. Low
power consumption is achieved by the use of a parallel architecture
and low-power circuit design techniques. When implemented
in a 0.2 micron CMOS technology at a 1.5 V supply voltage, the
parallel architecture consumes 45 mW providing a power efficiency
of 40 billion operations per second per Watt.
|
DISPS-4.2
|
High Performance and Cost Effective Memory Architecture for an HDTV Decoder LSI
Tetsuro Takizawa,
Junji Tajime,
Hidenobu Harasaki (C&C Media Research Laboratories, NEC Corporation)
This paper proposes an efficient memory mapping and a frame memory compression for an HDTV decoder LSI using Direct Rambus(TM) DRAM (DRDRAM). DRDRAM is employed to achieve high memory bandwidth required for HDTV decoding at the minimum memory cost. Proposed memory mapping achieves high memory bandwidth sufficient for HDTV decoding even in the worst case and no costly line buffers are required in the LSI for format conversion. Proposed frame memory compression method reduces memory cost half and achieves HDTV decoding
with a single 64 Mb DRDRAM chip without loss of memory access efficiency. Simulation results show that SNR degradation is 0.1 to 2 dB in the worst frame and no visible degradation is perceived except for a resolution chart sequence.
|
DISPS-4.3
|
A Programmable Processor with Multiple Functional Units and Banked Registers for General Purpose Numerical Processing
Takafumi Morifuji,
Yoshinori Takeuchi,
Masaharu Imai (Graduate School of Engineering Science, Osaka University)
We present an architecture of a General Purpose Numerical
Processor(GPNP). The processor with this architecture is capable of
running a wide variety of numerical processings and digital signal
processings with its programmability.
Flexibility and high-performance are achieved by multiple functional
units and their data transfer parallelism.
The prototype of a GPNP with five functional units can operate with
33-MHz clock frequency by simulation and its size corresponds with
230-kTr. and 34.5-kbytes on-chip memory.
|
DISPS-4.4
|
Fast Construction of Test Program Generators for Digital Signal Processors
Shai Rubin,
Moshe Levinger (IBM Research Division, Haifa Research Lab),
Randall R Pratt,
William P Moore (IBM Microelectronics Division, Essex Junction, Vt.)
Test-program generators play a key role in hardware
functional verification of large scale processors.
However, in the DSP domain, the usage of full-blown
test-program generators is much less popular,
mainly due to the limited resources (time and money)
available when developing such systems. This paper
describes a work-model for the fast, low cost
construction of a test-program generator for DSPs. The
core technology uses Genesys, a known test program
generator that, until now, has been used for the
verification of large scale processor families, such as
PowerPC and x86. We developed the model while using
Genesys for verification of the IBM C54XDSP, a
recently-announced fixed-point DSP. The case study
shows that it is possible to build a full test-program
generator in a very short time and thus achieve better
verification coverage in spite of the shorter
development time.
|
DISPS-4.5
|
Engineering Change Protocols for Behavioral Synthesis
Miodrag Potkonjak,
Darko Kirovski (Computer Science Department, University of California, Los Angeles)
Rapid prototyping and development of in-circuit and FPGA-based
emulators as key accelerators for fast time-to-market has
resulted in a need for fast error correction mechanisms. The
fabricated or emulated prototypes upon error diagnosis require
quick and as much as possible flexible engineering change (EC).
However, this problem has recently initiated research activity
mainly in the logic synthesis domain. We introduce the first set
of EC protocols for behavioral synthesis. The protocols support
both the pre- and post-processing EC paradigms. In addition,
instead of developing special algorithms for EC which is the
adopted research model, as a key contribution, we show that
using protocols which facilitate constraint manipulation of
the initial design specification there is no need for development
of specialized EC algorithms. The EC process is performed using
the standard optimization algorithms on the modified design.
Nevertheless, as shown on a number of behavioral synthesis tasks
including: resource assignment, design partitioning, and
operation scheduling, the approach provides variable and
guaranteed flexibility for incremental synthesis with minimal
hardware overhead.
|
DISPS-4.6
|
Operation Scheduling for Parallel Functional Units Using Genetic Algorithms
Thomas Zeitlhofer,
Bernhard R Wess (INTHFT, Vienna University of Technology)
In this paper, we describe a new and efficient approach to solve the scheduling problem for VLIW
architectures. The scheduling times of the operations are used as the problems parameters. This
in conjunction with a pruning technique based on critical path analysis leads to a significant
reduction of search space complexity. A genetic algorithm is used to search for valid schedules of a
given length. The genetic algorithm uses a fitness vector that guides the genetic operators crossover and mutation
resulting in a fast convergence towards near perfect solutions. The proposed method is also
applicable to the problem of register allocation by using a different fitness function.
Another advantage of the genetic
algorithm approach is that usually a great number of equally performing schedules is obtained
allowing for further optimization subject to arbitrary constraints.
|
DISPS-4.7
|
Extended Retiming: Optimal Scheduling via a Graph-Theoretical Approach
Timothy W O'Neil,
Sissades Tongsima,
Edwin H.-M. H Sha (Dept. of Computer Science and Engineering, Univ. of Notre Dame)
Many iterative or recursive applications commonly found
in DSP and image processing applications can be
represented by data-flow graphs (DFGs). This graph
is then used to perform DFG scheduling, where the
starting times for executing the application's
individual tasks are determined. The minimum length
of time required to execute all tasks once is called
the schedule length of the DFG. A great deal of
research has been done attempting to optimize such
applications by applying various graph transformation
techniques to the DFG in order to minimize this
schedule length. One of the most effective of these
techniques is retiming. In this paper, we demonstrate
that the traditional retiming technique does not always
achieve optimal schedules and propose a new
graph-transformation technique, extended retiming,
which will. We will also present an algorithm for
finding an extended retiming which transforms a DFG
into one with minimal schedule length. Finally, we will
demonstrate a constant-time algorithm which verifies
the existence of a retimed DFG with the minimum
schedule length.
|
DISPS-4.8
|
Minimum Initiation Interval of Multi-Module Recurrent Signal Processing Algorithm Realization with Fixed Communication Delay
Hung-Ying Tyan (Electrical Engineering Dept, Ohio State University),
Yu Hen Hu (Dept. Electrical & Computer Engr, University of Wisconsin)
A novel iterative algorithm is proposed to compute the theoretical
minimum initiation interval of a given recurrent algorithm when there is
a known, fixed inter-module communication delay. Specifically, for a
twin-module implementation problem, a novel representation called
necessary initiation interval is introduced to faciliate the
development of an iterative algorithm which yields both the minimum
initiation interval and the corresponding cut set of the cyclic
iterative computational dependence graph (ICDG). The convergence of
this iterative algorithm in finite iterations is also proved.
|
DISPS-4.9
|
A New Approach for Block-Floating-Point Arithmetic
Shiro Kobayashi,
Gerhard P Fettweis (Mobile Communications Systems, EE, Dresden University of Technology)
A new approach for implementing block-floating-point arithmetic is
proposed.
This approach intends to preserve the least-significant-bits (LSBs) to
improve signal processing quality.
The preservation of LSBs is automatically and perfectly done by
hardware.
Several simulation results of the proposed block-floating-point
implementation have shown improved SNRs over conventional
block-floating-point implementation as expected.
For the same number of bits for each data representation in the memory,
the SNRs better than floating-point are also observed.
|
DISPS-4.10
|
The Effects of Finite Bit Precision for a VLSI Implementation of the Constant Modulus Algorithm
Louis R Litwin (Purdue University),
Thomas J Endres,
Samir N Hulyalkar (Sarnoff Digital Communications),
Michael D Zoltowski (Purdue University)
One of the most popular blind equalization techniques is the Constant Modulus Algorithm (CMA), and it has gained popularity in the literature and in practice because of its LMS-like complexity and its robustness to non-ideal, but practical, conditions. Although CMA
has been well-studied in the literature, these analyses have typically implemented the algorithm using ``infinite'' precision arithmetic. The motivation for
this paper is a VLSI implementation of a high data rate, fractionally spaced, linear forward equalizer whose taps are adjusted using CMA. In this paper we examine how implementing CMA using finite bit precisions affects the algorithm's performance.
|
|