Home
 Mirror Sites
 General Information
 Confernce Schedule
 Technical Program
 Tutorials
 Industry Technology Tracks
 Exhibits
 Sponsors
 Registration
 Coming to Phoenix
 Call for Papers
 Author's Kit
 On-line Review
 Future Conferences
 Help
|
Abstract: Session DISPS-3 |
|
DISPS-3.1
|
A New Scalable DSP Architecture for System on Chip (SoC) Domains
Matthias H Weiss,
Frank Engel,
Gerhard P Fettweis (Mobile Communications Systems Chair, EE, UoT Dresden)
The ongoing advances in semiconductor technology are the enabler
for complete System on Chip (SoC) solutions. In this SoC
domain Digital Signal Processors (DSPs) are employed to
carry out software driven digital signal processing tasks.
Although DSPs could still be modified in the SoC domain, they are
mainly employed as fixed DSP cores. Possible adaptations to the
embedding system cannot be carried out.
Thus, our work is targeted to design expandable DSP architectures.
To achieve this expandability, we designed a sliced DSP
architecture. Here, the number of slices can be adapted towards
system needs. Specific system requirements can be achieved by
adding dedicated datapaths to these slices. With this approach one
magnitude of order in performance boost can be achieved, which
creates new demands for I/O processing. Thus, within our DSP
architecture we integrated a dedicated I/O processor.
In this paper we present this new scalable DSP architecture, tools
to map algorithms onto this DSP architecture, and the concept
of our new I/O controller. These technologies allow to easily
adapt our DSP architecture to different system requirements.
|
DISPS-3.2
|
A New Flexible Architecture for Variable Length DCT Targeting Shape-Adaptive Transform
Thuyen Le,
Manfred Glesner (Darmstadt University of Technology, Germany)
Shape-assisted block-based texture coding methodologies such as the
shape-adaptive DCT raise the need for an architecture which can
perform efficiently the transform of variable length N. This paper
presents an 1D DCT architecture satisfying the given requirement in
terms of scalability and modularity for 2 <= N <=8. The
architecture employs a Canonical-Signed-Digit serial multiplication
to reduce hardware resources and requires only one multiplier to
perform the final scaling. A proposed algorithm searching for an
optimal assignment of cosine factors leads to a resource saving of
about 12% for the multiplication blocks if Carry-Ripple-Adders are
assumed to be used. Different area and speed requirements are
possible since only feed-forward paths are involved and easily
pipelined. The architecture represents a trade-off between
time-recursive, fully modular but operation non-efficient structure
and multiplication efficient but irregular and fixed-length
implementation.
|
DISPS-3.3
|
An adaptive block-matching algorithm for motion estimation
Vasily G. Moshnyaga (Fukuoka University)
A new adaptive algorithm for the block matching motion
estimation is presented. The algorithm works in the full-search
fashion but unlike the FSBMA it adjusts the number of computations
dynamically to picture variation. Due to incorporated mechanism
of data-driven thresholding, the proposed approach performs as
four times as less operations comparing to the FSBMA while maintaining
the same quality of results. Its hardware implementation is simple
and compact. A supportive hardware design as well as simulation results
on benchmarks are outlined.
|
DISPS-3.4
|
Hardware Architecture for Real-Time Distance Transform
Jarmo H Takala (Tampere University of Technology),
Jouko O Viitanen (VTT Automation),
Jukka P.P Saarinen (Tampere University of Technology)
A distance transform (DT) converts a binary image
consisting of foreground (feature) and background
(non-feature) pixels into a gray level image where
each pixel contains the distance from the
corresponding pixel to the nearest foreground pixel.
The computation of the exact Euclidean DT is
computationally complex task and, therefore,
approximations are typically utilized. In this paper,
an area-efficient architecture for computing a DT
approximation is presented. The architecture utilizes
order-based encoded distance representation allowing
simple bitwise operations to be used for
determining the distance to the nearest foreground
pixel in the constrained neighborhood. Tabulated
distance values are used thus cumulative errors
are avoided. Due to the simple operations
real-time operation can be expected.
|
DISPS-3.5
|
AN EFFICIENT VLC DECOMPRESSION SCHEME FOR USER-DEFINED CODING TABLES
Bai_Jue Shieh,
Chen-Yi Lee (Department of Electronics Engineering, National Chiao Tung University)
With the increase of information and data types, high-throughput
and flexible memory-based VLC decoder is required for user-
defined coding tables to achieve higher compression ratio. In
this paper, we present a memory-based VLC decoder which is
quite suitable for the applications with user-defined tables. By
parallel loading data into memories, the coding tables can be
changed with much less time. The codeword-boundary
prediction algorithm breaks the recursive dependency of decoding
procedures. As a result, the VLC decoder can be
realized on multi-processor architecture and hence the decoding
throughput is enhanced significantly. Additionally, the INDEX-
OFFSET symbols that can recover all data with pure VLC
codeword and smaller table size are presented. Simulation
results show that the combination of the proposed VLC decoder
and user-defined table can achieve high decompression rate. As
a result, it is quite suitable for high data rate applications
with user-defined coding tables, such as MPEG-4.
|
DISPS-3.6
|
AN ANALOG ASSOCIATIVE MEMORY CHIP FOR VQ IMAGE COMPRESSION
Loris Navoni,
Monica Besana,
Pier Luigi Rolandi (STMicroelectronics)
This paper presents a hardware implementation of Full-Search Vector Quantization Image Compression using an associative memory chip based on analog flash technology.
Taking advantage of the features of this architecture, that performs a parallel search on 4K 64-elements codebook in 4.6 micro sec., encouraging results have been obtained in terms of perceived image quality and computation speed.
|
DISPS-3.7
|
A VLSI IMPLEMENTATION OF A REVERSIBLE VARIABLE LENGTH ENCODER/DECODER
Mario Novell (Electrical Engineering Department, University of California, Los Angeles),
Steve Molloy (Luxxon Corporation)
Variable Length Codes (VLCs) are known for their efficient
compression, but are susceptible to noisy environments due
to synchronization losses that can occur from bit error propagation.
Recent interest in Reversible Variable Length Codes (RVLCs)
has come about due to the growing need for wireless exchange
of compressed image and video signals over noisy channels and the
ability for RVLCs to provide greater error robustness than their
non-reversible counterparts (VLCs). With the current ITU H.263+
and ISO MPEG-4 standards already using RVLCs, low power implementations
of the RVLC are essential in providing error robustness
in real-time systems, while minimizing power consumption.
This paper will present the first published VLSI architectures of a
low power reversible variable length encoder and decoder.
Results show power consumption of less than 1 mW for both
encoder and decoder, with an additional 65% increase in area for
the decoder over that of a conventional VLD design.
|
DISPS-3.8
|
Parametric Spectral Estimation on a Single FPGA
Stephen J Bellis,
William P Marnane (National Microelectronics Research Centre, University College Cork),
Peter J Fish (School of Electronic Engineering & Computer Systems, University of Wales, Bangor)
Parametric, model based, spectral estimation techniques
can offer increased frequency resolution over
conventional short-term fast Fourier transform methods,
overcoming limitations caused by the windowing of
sampled, time domain, input data. However, parametric
techniques are significantly more computationally
demanding than the Fourier based methods and require a
wider range of arithmetic functionality; for example,
operations such as division and square-root are often
necessary. These arithmetic processes exhibit
communication bottleneck and their hardware
implementation can be inefficient when used in
conjunction with multipliers. A programmable,
bit-serial, multiplier/divider, which overcomes the
bottleneck problems by using a data interleaving
scheme,is introduced in this paper. This interleaved
processor is used to show how the parametric Modified
Covariance spectral estimator can be efficiently routed
on a field programmable gate array for real-time
applications.
|
|