# A LOW POWER RECONFIGURABLE DCT ARCHITECTURE TO TRADE OFF IMAGE QUALITY FOR COMPUTATIONAL COMPLEXITY

Jongsun Park and Kaushik Roy

School of Electrical and Computer Engineering, Purdue University West Lafayette, IN 47907, USA {jongsun, kaushik}@ecn.purdue.edu

## ABSTRACT

We present a low power reconfigurable DCT design, which achieves considerable computational complexity reduction in DCT operation with minimum image quality degradation. The approach is based on the modification of DCT bases in a bit-wise manner. Different computational complexity/image quality trade off levels are presented and a reconfigurable architecture, which can dynamically change from one trade off level to another, is also proposed. The reconfigurable DCT architecture can achieve power savings ranging from 20% to 70% for 5 different trade off levels.

## 1. INTRODUCTION

With the recent explosive growth of multimedia mobile services, there is a strong demand for image/video transmission in the next generation wireless applications. One of the major difficulties in designing wireless multimedia communication systems is the high energy requirement, which surpasses the current battery capabilities. In general, many image compression effort is aimed at keeping the distortion of reconstructed image as low as possible for a given bit rate. However, in applications such as portable multimedia devices, the best image quality may not always be required. Therefore, algorithmic/architectural approach for low power design, which is based on trade off between image quality and power consumption is clearly required.

Discrete Cosine Transform (DCT), which involves decomposing a set of image samples into a scaled set of discrete cosine basis functions, is one of major operations in current image/video compression standards. Several techniques have been proposed for making trade off between image quality and computation energy. In [1], the authors focus on the Joint Photographic Experts Group (JPEG) image compression algorithm and present the results of varying some image compression parameters on energy dissipation, required bandwidth and image quality. In [2, 3], instead of reconstructing each pixel by summing up all the frequency components in the DCT algorithm, the authors propose an approach which incrementally accumulates the image based on spectral contributions from more significant low frequency components and removes less significant high frequency components due to energy requirement.

We propose a low-power reconfigurable DCT architecture, which is based on efficient trade off between imagequality and computational-complexity. The low-power approach is based on the modification of DCT bases in a bitwise manner with minimum image quality degradation and considerable computational complexity reduction can be accomplished using the proposed scheme. We also present a reconfigurable DCT architecture, which can add the right degree of flexibility for low complexity DCT design. The low power technique proposed in this paper can be independently used with the techniques mentioned above [1, 2, 3] to obtain additional power consumption reduction.

#### 2. GENERAL DCT BASED IMAGE COMPRESSION PROCESS.

DCT is widely used in current international image/video coding standards such as JPEG, Motion Picture Experts Group (MPEG), the H.261, and H.263 video-telephony coding schemes. One of the advantages of DCT is its energy compaction property, which means that the signal energy is concentrated on a few components while most other components are zero or negligible.



Figure 1: DCT based source encoder procedure.

Figure 1 shows the general DCT based source encoder structure. As shown in the figure, the original source image is partitioned into  $8 \times 8$  blocks. Then, each block is transformed through a Forward Discrete Cosine Transform (FDCT) process and the output is quantized by uniform quantizer. The 2-D DCT process can be decomposed to an 1-D DCT followed by a transposition and a second 1-D DCT. The 1-D DCT transform is expressed as

$$z_{k} = \frac{c(k)}{2} \sum_{i=0}^{7} x_{i} \cos \frac{(2i+1)k\pi}{16}, \quad k = 0, 1, 2, \dots, 7, \quad (1)$$
$$c(k) = \begin{cases} 1/2, & k = 0\\ 1, & otherwise. \end{cases}$$

This research was funded in part by DARPA and by Semiconductor Research Corporation.

| Basis | Real Value | Binary number | CSD number               |
|-------|------------|---------------|--------------------------|
| a     | 0.4904     | 0011 1111     | $0100 \ 000\overline{1}$ |
| b     | 0.4619     | 0011 1011     | 0100 1011                |
| c     | 0.4157     | 0011 0101     | 0011 0101                |
| d     | 0.3536     | 0010 1101     | 0010 1101                |
| e     | 0.2778     | 0010 0100     | 0010 0100                |
| f     | 0.1913     | $0001\ 1000$  | 0001 1000                |
| g     | 0.0975     | 0000 1100     | $0001 \ 0\overline{1}00$ |

Table 1: 8-bit DCT bases.

This equation is represented in vector-matrix form as

$$z = T \cdot x. \tag{2}$$

Since  $8 \times 8$  coefficients matrix T in equation (2) is symmetric, the 1-D DCT matrix can be rearranged and expressed as

$$\begin{bmatrix} z_0 \\ z_2 \\ z_4 \\ z_6 \end{bmatrix} = \begin{bmatrix} d & d & d & d \\ b & f & -f & -b \\ d & -d & -d & d \\ f & -b & b & -f \end{bmatrix} \begin{bmatrix} x_0 + x_7 \\ x_1 + x_6 \\ x_2 + x_5 \\ x_3 + x_4 \end{bmatrix} (3)$$
$$\begin{bmatrix} z_1 \\ z_3 \\ z_5 \\ z_7 \end{bmatrix} = \begin{bmatrix} a & c & e & g \\ c & -g & -a & -e \\ e & -a & g & c \\ g & -e & c & -a \end{bmatrix} \begin{bmatrix} x_0 - x_7 \\ x_1 - x_6 \\ x_2 - x_5 \\ x_3 - x_4 \end{bmatrix} (4)$$

where  $c_k = \cos \frac{n\pi}{16}$ ,  $a = c_1$ ,  $b = c_2$ ,  $c = c_3$ ,  $d = c_4$ ,  $e = c_5$ ,  $f = c_6$ , and  $g = c_7$ .

As shown in Figure 1, the outputs of the 2-D DCT are quantized by a 64-element normalization matrix [4]. The visually important low frequency 2-D DCT coefficients, located in the top left region of the array, are quantized with short quantization steps while the rest of the coefficients are coarsely quantized.

#### 3. COMPUTATIONAL COMPLEXITY/IMAGE QUALITY TRADE OFF ALGORITHM

In this section, we present a methodology for making an efficient trade off between image quality and computational complexity. The approach is based on the modification of DCT bases in a bit-wise manner with minimum degradation in image quality. Different computationalcomplexity/image-quality trade off levels are presented.

The 1-D DCT matrix multiplication operation shown in (3) and (4) can be expressed as add and multiplication operations. Since any multiplication can be translated to series of add operations, the basic operations in 1-DCT matrix multiplication are add operations. Table 1 shows the 8 bit quantized DCT bases. When we implement the 1-D DCT operation using the basic add operations, the number of add operations required is one less than the total number of non-zero digits of the DCT bases shown in Table 1. Hence, we can use the number of non-zero digits as a measure of computational complexity. For the image quality measure, we will use Peak Signal to Noise Ratio (PSNR).

In order to reduce the number of non-zero digits in the DCT bases, we represent the DCT bases with Canonical

Signed Digit (CSD). The DCT bases in CSD format are shown in the rightmost column of Table 1. Once the DCT bases are represented in CSD format, we further reduce the number of non-zero digits in DCT bases at the expense of image quality (PSNR) degradation by changing the nonzero digits to zeroes.

When we make a trade off between computational complexity and image quality by reducing the number of nonzero digits in DCT bases, the idea is to achieve minimum image quality degradation for the same reduction in the number of adders. Let us define sensitivity of each nonzero digit in DCT bases as PSNR degradation of an image when the non-zero digit is modified to zero. The main idea is based on the fact that the non-zero digits in DCT bases have different sensitivities. In other words, for some nonzero digits, changing those to zeroes can give rise to a large PSNR degradation of an image while for other non-zero digits, it gives rise to a negligible PSNR degradation. For example, since the DCT basis d in Table 1 is most frequently used in 1-D DCT matrix multiplication in (3) and determines the DC component, modifying the non-zero digits in d to zeroes gives rise to a large PSNR degradation.

We reduce the number of non-zero digits in DCT bases based on the sensitivity measure of each non-zero digit. A greedy algorithm [5] is used. Among the 20 non-zero digits in DCT bases shown in Figure 2(a), the least sensitive non-zero digit is searched and it is changed to zero. Again, the next least sensitive non-zero digit is searched over the remaining 19 non-zero digits and modified to zero. As the number of non-zero digits in DCT bases decreases, the number of addition operations in 1-D DCT matrix multiplication also decreases with minimum image quality degradation.

| Original DCT bases                                                                                                                                               | Trade off Level 1                                                                                                                                                                                                                                                                                                                                                     | Trade off Level 2                                                                                                                                                                                                                                                                             |  |  |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|
| $ \begin{array}{cccccccccccccccccccccccccccccccccccc$                                                                                                            | $\begin{array}{cccccccccccccccccccccccccccccccccccc$                                                                                                                                                                                                                                                                                                                  | $ \begin{array}{cccccccccccccccccccccccccccccccccccc$                                                                                                                                                                                                                                         |  |  |
|                                                                                                                                                                  |                                                                                                                                                                                                                                                                                                                                                                       |                                                                                                                                                                                                                                                                                               |  |  |
| Trade off Level 3                                                                                                                                                | Trade off Level 4                                                                                                                                                                                                                                                                                                                                                     | Trade off Level 5                                                                                                                                                                                                                                                                             |  |  |
| Trade off Level 3<br>a = 0 1 0 0 0 0 0 0 0                                                                                                                       | Trade off Level 4<br>$a = 0 \ 1 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0$                                                                                                                                                                                                                                                                                                              | Trade off Level 5<br>a = 0 1 0 0 0 0 0 0                                                                                                                                                                                                                                                      |  |  |
| Trade off Level 3 $a = 0$ 1 0 0 0 0 0 $b = 0$ 1 0 0 0 0 0 0                                                                                                      | Trade off Level 4 $a = 0$ 1   0   0   0   0   0 $b = 0$ 1   0   0   0   0   0   0                                                                                                                                                                                                                                                                                     | Trade off Level 5 $a = 0$ 1   0   0   0   0   0 $b = 0$ 1   0   0   0   0   0   0                                                                                                                                                                                                             |  |  |
| Trade off Level 3 $a = 0$ 1   0   0   0   0   0 $b = 0$ 1   0   0   0   0   0   0 $c = 0$ 0   1   1   0   0   0   0                                              | Trade off Level 4 $a = 0$ 1   0   0   0   0   0 $b = 0$ 1   0   0   0   0   0   0 $c = 0$ 0   1   0   0   0   0   0                                                                                                                                                                                                                                                   | $\begin{array}{ccccccc} {\bf Trade \ off \ Level \ 5} \\ a = 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ b = 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ c = 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \end{array}$                                                                                                             |  |  |
| $\begin{array}{c ccccccccccccccccccccccccccccccccccc$                                                                                                            | $\begin{array}{ccccc} \textbf{Trade off Level 4} \\ a=0 & 1 & 0 & 0 & 0 & 0 & 0 \\ b=0 & 1 & 0 & 0 & 0 & 0 & 0 \\ c=0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ c=0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ \end{array}$                                                                                                                                                                  | $\begin{array}{ccccc} \textbf{Trade off } \textbf{Level S} \\ a=0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ b=0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ c=0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ e=0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \end{array}$                                                                           |  |  |
| $ \begin{array}{c ccccccccccccccccccccccccccccccccccc$                                                                                                           | $\begin{array}{c ccccccccccccccccccccccccccccccccccc$                                                                                                                                                                                                                                                                                                                 | $\begin{array}{ccccc} \textbf{Trade off } \textbf{Level S} \\ a = 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ b = 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ c = 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ e = 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ f = 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \end{array}$                              |  |  |
| $ \begin{array}{c ccccccccccccccccccccccccccccccccccc$                                                                                                           | $\begin{array}{ccccc} \textbf{Trade off Level J} \\ a = 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ b = 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ c = 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ c = 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ f = 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ g = 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \end{array}$                                                                                   | $\begin{array}{cccccc} \textbf{Trade off } Level S \\ a = 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ b = 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ c = 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ e = 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ f = 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ g = 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \end{array}$ |  |  |
|                                                                                                                                                                  | $\begin{array}{cccccc} \text{Trade off Level J} \\ a = 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ b = 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ c = 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ e = 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ f = 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ g = 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ d = 0 & 0 & 1 & 0 & 1 & 1 & 0 & 1 \end{array}$                                               | $\begin{array}{cccccccccccccccccccccccccccccccccccc$                                                                                                                                                                                                                                          |  |  |
| IFrade off Level 3 $a = 0$ 1 0 0 0 0 0 0 $b = 0$ 1 0 0 0 0 0 0 0 $c = 0$ 0 1 1 0 0 0 0 0 $e = 0$ 0 1 1 0 0 0 0 0 $g = 0$ 0 0 1 0 1 0 0 0 $d = 0$ 0 1 0 1 1 0 1 1 | $\begin{array}{ccccccc} \text{Trade off Level J} \\ a = 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ b = 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ c = 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ e = 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ f = 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ g = 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ d = 0 & 0 & 1 & 0 & 1 & 1 & 0 & 1 \\ \# \text{ of non zero digits = S} \end{array}$ | $\begin{array}{c ccccccccccccccccccccccccccccccccccc$                                                                                                                                                                                                                                         |  |  |

Figure 2: DCT bases for different trade off levels.

Using the proposed method, we have simulated 10 different images. Based on the simulation results, various sets of DCT bases for making image-quality/hardwarecomplexity trade off are presented. The original DCT bases in CSD format are shown in Figure 2(a) and the selected non-zero digits in the block show the four least sensitive

|         | original | level 1 | level 2 | level 3 | level 4 | level 5 |
|---------|----------|---------|---------|---------|---------|---------|
| lena    | 33.0     | 32.9    | 32.7    | 32.6    | 30.5    | 27.9    |
| peppers | 34.9     | 34.6    | 34.1    | 34.0    | 30.9    | 27.4    |
| monarch | 34.05    | 33.85   | 33.2    | 33.1    | 29.48   | 26.86   |

Table 2: PSNR(dB)'s for 3 images for different trade off

levels.



Figure 3: (a) Original lena image. (b) lena image in level 1. (c) lena image in level 2. (d) lena image in level 3. (e) lena image in level 4. (f) lena image in level 5.

non-zero digits that are changed to zero in trade off level 1. Likewise, the selected digits in the block shown in Figure 2(b) show the three least sensitive non-zero digits, which are changed to zero in trade off level 2. The number of non-zero digits in a block can be changed depending on the required number of trade off levels. We present five different trade off levels in this paper. <sup>1</sup> As the trade off level goes higher, the image quality degrades with considerable computational complexity reduction. We can also notice from Figure 2 that we did not modify d since it has a highest sensitivity.

Figure 3 shows the Lena images for different imagequality/computational complexity trade off levels and Table 2 shows the PSNR's of 3 images for different trade off levels. Figure 4 shows the graph showing the image quality degradation vs. the number of addition reduction in DCT operation. The graph shows considerable reduction of addition operations with image quality degradation. The image quality abruptly decreases at level 5 with around 60% of total number of addition reduction.



Figure 4: Image quality degradation vs. reduction in the number of additions.

#### 4. RECONFIGURABLE DCT ARCHITECTURE

Based on the proposed image-quality/computationalcomplexity trade off algorithm shown in the previous section, we propose a reconfigurable DCT architecture. As shown in Figure 2, the number of addition operations for calculating 1-D DCT is significantly reduced at the expense of image quality degradation. In this section, using an architectural level technique, further reduction in power dissipation is achieved. The reconfigurable DCT architecture can dynamically change from one trade off level to another with little overhead.

Let us take an example for calculating  $z_1$  in 1-D DCT matrix multiplications shown in (3) and (4). From equations (3) and (4),  $z_1$  can be expressed as

$$z_1 = a(x_0 - x_7) + c(x_1 - x_6) + e(x_2 - x_5) + g(x_3 - x_4).$$
 (5)

Figure 5 shows the architecture for calculating  $z_1$ . The figure also shows the modification of DCT bases for each trade off level and the corresponding architecture. As shown in the figure, AND gates are added to the input signals. The AND gates are efficiently used for reconfiguration of hardware. When the level control signals are ones, the AND gates pass the inputs and the circuit works like a normal DCT processor without any modification of DCT bases. When the level control signals are zeroes, the AND gates simply make both inputs of the adders zeroes and turn off the adders. In Figure 5, the adders in the dotted boxes show the adders turned off at each trade off level.

Figure 5 shows a balanced carry save adder tree structure, while Figure 6 shows a unbalanced one. For both the unbalanced and balanced architectures, the required number of additions are 9 without any DCT basis modification. When we do not make any trade off (when using the original DCT bases), the balanced architecture shows better performance since it has a shorter critical path compared to the unbalanced one. However, as we make trade off by modifying the DCT bases, the unbalanced architecture becomes better in terms of performance and power consumption. For example, in the trade off level 4, only 3 adders lie on the critical path with 6 adders turned off in the unbalanced architecture. However, in the balanced architecture, 4 adders lie on the critical path with 5 adders turned off in trade off

<sup>&</sup>lt;sup>1</sup>In trade off level 1 and 2, which reduce 4 and 7 non-zero digits respectively, brute force search generates different sets of non-zero digits for minimum PSNR's. However, the difference between the minimum PSNR obtained using greedy algorithm and that using brute force search is negligibly small. Moreover, greedy algorithm finds a set of non-zero digits which makes a reconfigurable DCT implementation easier. The reconfigurable DCT architecture is presented in section 4.



Figure 5: Balanced carry save adder architecture for calculating  $z_1$ .

Table 3: Power consumption of 1-D DCT block for different trade off levels.

|             | original | level 1            | level 2  |
|-------------|----------|--------------------|----------|
| Power       | 81.4  mW | $68.8 \mathrm{mW}$ | 56.2  mW |
| Consumption | level 3  | level 4            | level 5  |
|             | 44.1  mW | 33.0  mW           | 24.7  mW |

level 4. Moreover, in trade off level 5, only 1 adder lies on the critical path with 8 adders turned off in the unbalanced architecture. In the balanced architecture, 3 adders lie on the critical path with 6 adders turned off in trade off level 5.

Since carry save adders are used for implementing both balanced and unbalanced adder trees, the performance disadvantage of the unbalanced architecture is only two full adder delay. Furthermore, considering the power consumption advantage of the unbalanced adder structure, unbalanced adder tree structures are used in our DCT implementation.

The proposed DCT architecture is implemented using 0.25  $\mu$  tsmc library and power consumption is measured on the spice netlist level simulation with *Powermill*. Table 3 shows the power consumption of the proposed 1-D DCT architecture for different trade off levels. At trade off level 3, the proposed architecture shows around 47% power savings with small image quality degradation. When further low power DCT operation is required at the expense of image quality degradation, the reconfigurable DCT architecture can be changed to trade off level 5, which has 70% power savings compared to the normal DCT operation. Depending on the required amount of power consumption reduction, the proposed scheme allows the selection of different DCT architectures, thus achieving considerable power savings at the expense of acceptable image quality degradation.



Figure 6: Unbalanced carry save adder architecture for calculating  $z_1$ .

# 5. CONCLUSION

We present a low power reconfigurable DCT architecture which is based on the computation-energy/image-quality trade off. Various trade off levels are presented and the proposed reconfigurable architecture can dynamically change from one trade off level to another without large hardware overhead. Consequently, the reconfigurable DCT leads to 47% power savings without seriously compromising the image quality. More computation power saving can be achieved using the proposed architecture with additional image quality degradation. The idea presented in this paper can assist in the design of DCT algorithm and its implementation for low power applications.

#### 6. REFERENCES

- C.N.Taylor and S.Dey, "Adaptive Image Compression for Enabling Mobile Multimedia Communication," in Proc. IEEE Intl. Conference on Communications (ICC), June 2001, pp. 1925-9 vol.6.
- [2] A. Sinha, A. Wang and A. Chandrakasan, "Energy Scalable System Design," IEEE Trans. on VLSI Systems, April, 2002, vol. 10, No. 2.
- [3] J. Bracamonte, M. Ansorge and F. Pellandini, "VLSI systems for image compression. A power consumption/image resolution trade-off approach," Proc.of Conf. on Digital Compression Technologies and Systems for Video Communication, SPIE Vol. 2952, Berlin, Germany, Oct. 7-11, 1996, pp 591-596.
- [4] ITU-T Recommendation T.81, Digital Compression and coding of continuous-tone still images, September, 1992.
- [5] T. H. Cormen, C. E. Leiserson and R. L. Rivest, "Introduction to Algorithms" The MIT press, Cambridge, 1990.