Authors:
Matthias H Weiss,
Frank Engel,
Gerhard P Fettweis,
Page (NA) Paper number 2301
Abstract:
The ongoing advances in semiconductor technology are the enabler for
complete System on Chip (SoC) solutions. In this SoC domain Digital
Signal Processors (DSPs) are employed to carry out software driven
digital signal processing tasks. Although DSPs could still be modified
in the SoC domain, they are mainly employed as fixed DSP cores. Possible
adaptations to the embedding system cannot be carried out. Thus, our
work is targeted to design expandable DSP architectures. To achieve
this expandability, we designed a sliced DSP architecture. Here, the
number of slices can be adapted towards system needs. Specific system
requirements can be achieved by adding dedicated datapaths to these
slices. With this approach one magnitude of order in performance boost
can be achieved, which creates new demands for I/O processing. Thus,
within our DSP architecture we integrated a dedicated I/O processor.
In this paper we present this new scalable DSP architecture, tools
to map algorithms onto this DSP architecture, and the concept of our
new I/O controller. These technologies allow to easily adapt our DSP
architecture to different system requirements.
Authors:
Thuyen Le, Darmstadt University of Technology, Germany (Germany)
Manfred Glesner, Darmstadt University of Technology, Germany (Germany)
Page (NA) Paper number 1199
Abstract:
Shape-assisted block-based texture coding methodologies such as the
shape-adaptive DCT raise the need for an architecture which can perform
efficiently the transform of variable length N. This paper presents
an 1D DCT architecture satisfying the given requirement in terms of
scalability and modularity for 2 <= N <=8. The architecture employs
a Canonical-Signed-Digit serial multiplication to reduce hardware resources
and requires only one multiplier to perform the final scaling. A proposed
algorithm searching for an optimal assignment of cosine factors leads
to a resource saving of about 12% for the multiplication blocks if
Carry-Ripple-Adders are assumed to be used. Different area and speed
requirements are possible since only feed-forward paths are involved
and easily pipelined. The architecture represents a trade-off between
time-recursive, fully modular but operation non-efficient structure
and multiplication efficient but irregular and fixed-length implementation.
Authors:
Vasily G. Moshnyaga, Fukuoka University (U.K.)
Page (NA) Paper number 2389
Abstract:
A new adaptive algorithm for the block matching motion estimation is
presented. The algorithm works in the full-search fashion but unlike
the FSBMA it adjusts the number of computations dynamically to picture
variation. Due to incorporated mechanism of data-driven thresholding,
the proposed approach performs as four times as less operations comparing
to the FSBMA while maintaining the same quality of results. Its hardware
implementation is simple and compact. A supportive hardware design
as well as simulation results on benchmarks are outlined.
Authors:
Jarmo H Takala,
Jouko O Viitanen,
Jukka P.P Saarinen,
Page (NA) Paper number 1128
Abstract:
A distance transform (DT) converts a binary image consisting of foreground
(feature) and background (non-feature) pixels into a gray level image
where each pixel contains the distance from the corresponding pixel
to the nearest foreground pixel. The computation of the exact Euclidean
DT is computationally complex task and, therefore, approximations are
typically utilized. In this paper, an area-efficient architecture for
computing a DT approximation is presented. The architecture utilizes
order-based encoded distance representation allowing simple bitwise
operations to be used for determining the distance to the nearest foreground
pixel in the constrained neighborhood. Tabulated distance values are
used thus cumulative errors are avoided. Due to the simple operations
real-time operation can be expected.
Authors:
Bai-Jue Shieh,
Chen-Yi Lee,
Page (NA) Paper number 1611
Abstract:
With the increase of information and data types, high-throughput and
flexible memory-based VLC decoder is required for user- defined coding
tables to achieve higher compression ratio. In this paper, we present
a memory-based VLC decoder which is quite suitable for the applications
with user-defined tables. By parallel loading data into memories, the
coding tables can be changed with much less time. The codeword-boundary
prediction algorithm breaks the recursive dependency of decoding procedures.
As a result, the VLC decoder can be realized on multi-processor architecture
and hence the decoding throughput is enhanced significantly. Additionally,
the INDEX- OFFSET symbols that can recover all data with pure VLC codeword
and smaller table size are presented. Simulation results show that
the combination of the proposed VLC decoder and user-defined table
can achieve high decompression rate. As a result, it is quite suitable
for high data rate applications with user-defined coding tables, such
as MPEG-4.
Authors:
Loris Navoni,
Monica Besana,
Pier Luigi Rolandi,
Page (NA) Paper number 1875
Abstract:
This paper presents a hardware implementation of Full-Search Vector
Quantization Image Compression using an associative memory chip based
on analog flash technology. Taking advantage of the features of this
architecture, that performs a parallel search on 4K 64-elements codebook
in 4.6 micro sec., encouraging results have been obtained in terms
of perceived image quality and computation speed.
Authors:
Mario Novell,
Steve Molloy,
Page (NA) Paper number 2000
Abstract:
Variable Length Codes (VLCs) are known for their efficient compression,
but are susceptible to noisy environments due to synchronization losses
that can occur from bit error propagation. Recent interest in Reversible
Variable Length Codes (RVLCs) has come about due to the growing need
for wireless exchange of compressed image and video signals over noisy
channels and the ability for RVLCs to provide greater error robustness
than their non-reversible counterparts (VLCs). With the current ITU
H.263+ and ISO MPEG-4 standards already using RVLCs, low power implementations
of the RVLC are essential in providing error robustness in real-time
systems, while minimizing power consumption. This paper will present
the first published VLSI architectures of a low power reversible variable
length encoder and decoder. Results show power consumption of less
than 1 mW for both encoder and decoder, with an additional 65% increase
in area for the decoder over that of a conventional VLD design.
Authors:
Stephen J Bellis,
William P Marnane,
Peter J Fish,
Page (NA) Paper number 1563
Abstract:
Parametric, model based, spectral estimation techniques can offer increased
frequency resolution over conventional short-term fast Fourier transform
methods, overcoming limitations caused by the windowing of sampled,
time domain, input data. However, parametric techniques are significantly
more computationally demanding than the Fourier based methods and require
a wider range of arithmetic functionality; for example, operations
such as division and square-root are often necessary. These arithmetic
processes exhibit communication bottleneck and their hardware implementation
can be inefficient when used in conjunction with multipliers. A programmable,
bit-serial, multiplier/divider, which overcomes the bottleneck problems
by using a data interleaving scheme,is introduced in this paper. This
interleaved processor is used to show how the parametric Modified Covariance
spectral estimator can be efficiently routed on a field programmable
gate array for real-time applications.
|