HARDWARE FOR IMAGE AND VIDEO CODING

Chair: Konstantinos Konstantinides, Hewlett-Packard Laboratories (USA)

Home

Algorithm-Based Low-Power Transform Coding Architectures

Authors:

An-Yeu Wu, University of Maryland (USA)
K.J. Ray Liu, University of Maryland (USA)

Volume 5, Page 3267

Abstract:

In most low-power VLSI designs, the supply voltage is usually reduced to lower the total power consumption. However, the device speed will be degraded as the supply voltage goes down. In this paper, we propose new algorithmic-level techniques for compensating the increased delays based on the multirate approach. We will show how to compute most of the discrete sinusoidal transforms through the decimated low-speed sequences with reasonable linear hardware overhead. For the case the decimation factor equal to two, the overall power consumption can be reduced to about one-third of the original design. The resulting multirate low- power architectures are regular, modular, and free of global communications. Such properties are very suitable for VLSI implementations. The proposed architectures can also be applied to very high-speed block transforms where only low-speed operators are required.

300dpi TIFF Images of pages:

3267 3268 3269 3270

Acrobat PDF file of whole paper:

ic953267.pdf

TOP

A 2D-DCT Low-Power Architecture for H.261 Coders

Authors:

E. Scopa, DEIS Universita di Bologna (ITALY)
A. Leone, DEIS Universita di Bologna (ITALY)
R. Guerrieri, DEIS Universita di Bologna (ITALY)
G. Baccarani, DEIS Universita di Bologna (ITALY)

Volume 5, Page 3271

Abstract:

A low-power architecture for 2D-DCT is presented. It has been designed for portable H.261-compliant video-telephone applications, but most of the results and considerations apply to MPEG systems too. The presence of quantization in the coding process has been exploited, adapting the precision of DCT calculations to the quantization noise level. The proposed architecture has the capability of dynamically controlling power consumption by reducing the precision to the minimum required level and turning off sub-systems when they are not necessary for the computation. Compared with a standard implementation, power consumption is reduced by a factor between 7 and 10, without appreciable degradation of the transmission quality.

300dpi TIFF Images of pages:

3271 3272 3273 3274

Acrobat PDF file of whole paper:

ic953271.pdf

TOP

2-D DCT Using On-Line Arithmetic

Authors:

Javier Bruguera, University of Santiago de Compostela (SPAIN)
Tomas Lang, University of California-Irvine (USA)

Volume 5, Page 3275

Abstract:

We present a VLSI architecture for the evaluation of the (8x8)--point 2--D DCT with on--line arithmetic. The utilization of on--line arithmetic, in combination with an algorithm based on FCT and matrix multiplication, reduces the total hardware maintaining a data rate and a latency similar to approaches based on distributed or parallel arithmetic. The architecture has been integrated in a chip using a 1 (mu) CMOS technology, occupying an area of 56.7 mm^2.

300dpi TIFF Images of pages:

3275 3276 3277 3278

Acrobat PDF file of whole paper:

ic953275.pdf

TOP

Area Efficient Fast Huffman Decoder for Multimedia Applications

Authors:

Heonchul Park, Samsung Electronics Co. Ltd. (KOREA)
Jae-Chul Son, Samsung Electronics Co. Ltd. (KOREA)
Seong-Rae Cho, Samsung Electronics Co. Ltd. (KOREA)

Volume 5, Page 3279

Abstract:

An area-efficient VLSI architecture for fast Huffman decoder which can support HDTV rates was proposed. Huffman coding has been widely used to reduce storage and channel bandwidth, and several image compression standards such as JPEG, MPEG require to perform Huffman decoding in real-time with high throughput. The proposed VLSI architecture for Huffman decoder requires less numbers of comparators and smaller size of data rotator to simulate Barrel shifter. It can decode up to 17 bits per cycle and employ 40MHz clock which can support HDTV rates. Thus, it can decode Huffman coded sequence up to 680 Mbps at peak. Compared with parallel implementation in [3] which requires up to 1460 PEs and has 10Mbps throughput, the proposed architecture is a single PE design with the competitive processing power. It requires 25% of the area of the known single PE design in [3].

300dpi TIFF Images of pages:

3279 3280 3281 3282

Acrobat PDF file of whole paper:

ic953279.pdf

TOP

A Programmable Motion Estimation Processor for Full Search Block Matching

Authors:

Alain Pirson, Thomson Consumer Electronics Components (FRANCE)
Fathy Yassa, Thomson Consumer Electronics Components (FRANCE)
Philippe Paul, Thomson Consumer Electronics Components (FRANCE)
Barth Canfield, TC Electronics (USA)
Friedrich Rominger, Thomson Consumer Electronics (GERMANY)
Andreas Graf, Thomson Consumer Electronics (GERMANY)
Detlef Teichner, Thomson Consumer Electronics (GERMANY)

Volume 5, Page 3283

Abstract:

This paper describes a programmable motion estimation processor applying a block matching technique on large search windows. Developed in the context of an MPEG-2 video encoder, its use can be extended to any application where fast motion estimation is required. Its high performance (17 Gops peak) and its ability to work in parallel make it ideal for real time applications like video compression. The Block Matching Processor (BMP) consists of a CPU associated with several specific units including a fast motion estimator, a DRAM interface, IO ports and some on-chip memory. This approach allies the flexibility of a CPU to the efficiency of dedicated hardware. A DRAM controller minimizes the impact of data transfer on the computing power.

300dpi TIFF Images of pages:

3283 3284 3285 3286

Acrobat PDF file of whole paper:

ic953283.pdf

TOP

A RISC Controlled Motion Estimation Processor for MPEG-2 and HDTV Encoding

Authors:

D. Charlot, Thomson Consumer Electronics Components (FRANCE)
J.-M. Bard, Thomson Consumer Electronics Components (FRANCE)
B. Canfield, Thomson Consumer Electronics (USA)
C. Cuney, Thomson Consumer Electronics Components (FRANCE)
A. Graf, Thomson Consumer Electronics (GERMANY)
A. Pirson, Thomson Consumer Electronics (FRANCE)
D. Teichner, Thomson Consumer Electronics (USA)
F. Yassa, Thomson Consumer Electronics (FRANCE)

Volume 5, Page 3287

Abstract:

In this paper, we describe the architecture of a hierarchical motion estimation processor, with respect to the MPEG-2 encoding standard. This processor can also be used in HDTV applications. The motion estimation processing is in 2 steps: first full-pixel then half-pixel. Several modes are possible, depending on the image types (I, P or B - MPEG terminology, frame based or field based). A decision is taken in this processor to choose the best mode. The architecture is based on a RISC controller, external DRAMs to store anchor frames and specific hardware for processing the distortions. The architecture was chosen to achieve high performance, programmability and high memory bandwidth.

300dpi TIFF Images of pages:

3287 3288 3289 3290

Acrobat PDF file of whole paper:

ic953287.pdf

TOP

Associative Approach to Real Time Color, Motion and Stereo Vision

Authors:

Avidan Akerib, A.S.P. Solutions Ltd. (ISRAEL)
Rutie Adar, A.S.P. Solutions Ltd. (ISRAEL)

Volume 5, Page 3291

Abstract:

This article presents a new methodology, based on the Associative Signal Processing (A.S.P) approach to real time parallel image processing. The architecture is fully programmable and can be programmed to implement a wide range of color image processing, computer vision and multi-media algorithms at much faster than video rate. The approach is based on an array of thousands of processors, each is nothing but an "intelligent" memory word that can identify itself to a value and change its content accordingly. Benchmark results show that when assigning an "intelligent" word (processor) to each pixel in the image, computational power of several hundred billion instructions per second is obtained. A chip based on this approach was developed by A.S.P. Solutions Ltd. The chip called XIUM (tm) includes 1024 processors, each with 72 "intelligent" bits, has computational power of 1 BIPS and cloud identifies at a rate of 20 billion patterns per second. A commercial chip with 2 BIPS performance - The XIUM (tm) -II is now on the final stage of development.

300dpi TIFF Images of pages:

3291 3292 3293 3294

Acrobat PDF file of whole paper:

ic953291.pdf

TOP

A High-Throughput, Flexible VLSI Architecture for Motion Estimation

Authors:

Chin-Liang Wang, National Tsing Hua University (REPUBLIC OF CHINA)
Ker-Min Chen, National Tsing Hua University (REPUBLIC OF CHINA)
Jin-Min Hsiung, National Tsing Hua University (REPUBLIC OF CHINA)

Volume 5, Page 3295

Abstract:

This paper presents a new systolic VLSI architecture to realize the full-search block matching algorithm for motion estimation. The architecture has an efficiency of 100 percent and a throughput of one motion vector per $n^2$ cycles, where nxn is the reference block size. As compared to existing VLSI motion estimators with the same efficiency and throughput, the proposed one not only gains advantages in the flexibility of changing the reference blocksize and the tracking range, but also employs no additional control circuitry to determine the motion vectors. These features make it useful for a wide rangeof applications.

300dpi TIFF Images of pages:

3295 3296 3297 3298

Acrobat PDF file of whole paper:

ic953295.pdf

TOP

Semi-Systolic Array Based Motion Estimation Processor Design

Authors:

Mei-Cheng Lu, National Chiao Tung University (REPUBLIC OF CHINA)
Chen-Yi Lee, National Chiao Tung University (REPUBLIC OF CHINA)

Volume 5, Page 3299

Abstract:

This paper presents a new VLSI architecture for full-search block matching algorithm. The proposed architecture has two specific features: (1) it has a processor element (PE) array which provides sufficient computational power, where PE's work in a semi-systolic style and (2) it contains stream memory banks which provide scheduled data flow to reduce idle operations within PE array. By exploiting broadcasting and local data communications, hardware efficiency of the proposed architecture can be up to 100%, which outperforms those systolic-array solutions found in the literature. This efficient VLSI architecture is then demonstrated by a 16X16 motion estimation processor design whose speed can be up to 100 MHz based on 0.8(mu) m CMOS double metal process.

300dpi TIFF Images of pages:

3299 3300 3301 3302

Acrobat PDF file of whole paper:

ic953299.pdf

TOP

A Novel Modular Systolic Array Architecture for Full-Search Block Matching Motion Estimation

Authors:

Hangu Yeo, University of Wisconsin (USA)
Yu Hen Hu, University of Wisconsin (USA)

Volume 5, Page 3303

Abstract:

In this paper, we propose a modular systolic array architecture for the full-search block matching motion estimation algorithm(FBMA). With this novel architecture, we are able to generate a motion vector for every reference block in raster scan order while achieving 100% processor utilization and high throughput rate. Furthermore, we devised a scheme to save the pin count (I/O) by sharing memory units. This results in low memory bandwidth. This architecture is scalable in that it can easily be adapted to handle larger search ranges and different block sizes without increasing the effective latency.

300dpi TIFF Images of pages:

3303 3304 3305 3306

Acrobat PDF file of whole paper:

ic953303.pdf