# A GENERALIZED ANALOG ARCHITECTURE FOR DCT, DST AND ITS INVERSE

Ashis Kumar Mal and Arindam Basu

Department of E & ECE, IIT Kharagpur, India, Email: akmal@ece.iitkgp.ernet.in

# ABSTRACT

This paper describes a sampled analog architecture, for computing DCT or DST, using switched capacitor principle with capacitance switching. The input sample stream is applied to an array of capacitors and multiplied by all the DCT/DST coefficients concurrently using capacitor ratios. These capacitors are switched concurrently with the help of a switching matrix, to realize switched capacitor integrators for performing necessary addition/ subtraction. The architecture may also be used for computing inverse DCT and DST transforms. Proposed architecture simple, regular and can be used for on-line computations, with good accuracy.

# 1. INTRODUCTION

The discrete cosine transforms (DCT) and discrete sine transform (DST) are widely applied to processing, computation, and many other areas related to speech and image signals. It is found that in the case of images with high correlation coefficient, DCT based coding results in better performance but for low correlation images, DST gives lower bit rate [1]. Thus, it is preferable to design a VLSI hardware with the capability to compute both DCT and DST (DC(S)T) using the same resources. Though there exist several architectures for computing these transforms separately, no attempts are found to combine DCT and DST into one architecture using analog blocks. An analog architecture suggested for DCT/DFT[2], uses the principle of charge scaling ADC and requires large number of matched capacitors to realize each coefficient. It also uses extra opamps just for handling negative coefficients. A two dimensional DCT structure has also been reported[3], which uses fixed array of simple switched capacitor(SC) blocks, and requires simultaneous application of all analog samples, with little consideration to switch sharing. Negative coefficients are handled there, by inverting the supply voltage which has its own demerits.

In this paper, an architecture is proposed to compute DC(S)T in *time-multiplexed* method[4][5], based on '*swit-ched*' -switched capacitor structure, where capacitors are switched according to the incoming sample sequence. Here each sample is applied to charge an array of N input capacitors, which is proportional to different kernel coefficients.

The charge of these input capacitors then switched to another set of identical output capacitors to perform weighted multiplication and addition (MAC) concurrently. For an N length transform, N multiplied values are steered through a switching-matrix in such a way that an SC integrator array is realized dynamically. Depending on the transform kernel, switching between input and output capacitors are changed for each incoming samples and thus entire computation is performed stepwise. The proposed scheme is simple, regular, configurable, with maximum component sharing and easy to implement using mixed-signal VLSI process.

## 2. THEORY

The one dimensional DCT of a sequence f(n), for  $n = 0, 1, \ldots, N-1$ , is given by

$$C(k) = M_k^c \sum_{n=0}^{N-1} \cos\left[\frac{(2n+1)k\pi}{2N}\right] f(n)$$
 (1)

And for DST the expression is

$$S(k) = M_k^s \sum_{n=0}^{N-1} \sin\left[\frac{(2n+1)(k+1)\pi}{2N}\right] f(n) \quad (2)$$

where  $k = 0, 1, \dots N - 1$  and  $M_k^c$  and  $M_k^s$  are constants. Neglecting the constant factor, above equations can be computed recursively as

$$H_{i+1}(k) = H_i(k) + U_i(k)$$
(3)  
[i = 0, 1, ..., N - 1]

where  $H_0(k) = 0$  and desired  $C(k)[S(k)] \equiv H_{N-1}(k)$  and DC(S)T update term is given by

$$U_i(k) = \delta \cos\left[\frac{(2i+1)l\pi}{2N}\right] f(i) \quad \text{for DC(S)T} \quad (4)$$

$$= +\cos(m\theta).f(i)$$
 for DCT (5)

where l can be either k or N-(k+1), and sign term  $\delta$  is 1 or  $(-1)^i$  depending on DCT or DST with  $\theta = \frac{\pi}{2N}$ . For DCT, m = [(2i+1)k] modulo 4N, thus  $0 \le m\theta < 2\pi$ . We

can however apply further trigonometric transform  $m \to r$ such that all cosine values are mapped to first quadrant so that  $r \in [0, N - 1]$ . Such transformation for DCT is done using basic trigonometric identity as follows.

$$\cos(r\theta) = \begin{cases} +\cos(m\theta) & \text{if} \quad 0 \le m \le N \\ -\cos(2N-m)\theta & \text{if} \quad N < m \le 2N \\ -\cos(m-2N)\theta & \text{if} \quad 2N < m \le 3N \\ +\cos(4N-m)\theta & \text{if} \quad 3N < m < 4N \end{cases}$$

For DCT, this enables (5) to be rewritten as

$$U_i(k) = \delta_{ik} \cos(r\theta) f(i) \tag{6}$$

where

$$\delta_{ik} = \begin{cases} -1 & \text{if } N < m \le 3N \\ +1 & \text{otherwise} \end{cases}$$
(7)

Using (2), (4) and (5) update term DST can be shown as

$$U_i(k)|_{DST} = (-1)^i . U_i(N-1-k)|_{DCT}$$
(8)

Which implies, same  $U_i(k)$  can be used for DC(S)T, but for computing k-th or (N - k - 1)-th update component with sign term for DST computed by

$$\delta_{ik}|_{DST} = \begin{cases} +\delta_{ik} & \text{for } i \text{ even} \\ -\delta_{ik} & \text{for } i \text{ odd} \end{cases}$$
(9)

Finally, (6) can be expressed as

$$U_i(k) = \delta_{ik} [S_{ik}] [t_k]^T f(i)$$
(10)

where  $[S_{ik}]$  denotes an N dimensional vector whose only one element is 1 and it's position depends on i, k. And  $[t_r]$  is another  $1 \times N$  vector, comprising discrete (co)sine samples given by

$$[t_r] = \left[\cos\left(\frac{r\pi}{2N}\right)\right] \tag{11}$$

Using above formulation, we can design a combined architecture to compute DCT or DST which is essentially controlled through  $S_{ik}$  and  $\delta_{ik}$  signals. Again, both these signals are binary signals, hence, can be easily generated, multiplexed and manipulated digitally. The same architecture may be used for computing inverse transforms because of their similar kernels.



Fig. 1. Basic Principle

### **3. ARCHITECTURE**

In this section we propose an analog architecture to compute N point DCT/DST based on the above formulations. Basic scheme is shown in Fig 1. The core blocks are (i) a capacitor array, to set to N (co)sine values, (ii) an accumulator array, comprising amplifiers and feed back capacitors (iii) a two-dimensional cross-point switch with a (iv) digital controller to carry out the operation recursively with dynamic selection sampling capacitors. The integrator is used here in both non-inverting and inverting mode to manipulate both -ve/+ve coefficients needed for computation.

## 3.1. Capacitor Array

The heart of the architecture is an array of N capacitors, each acting as sampling capacitor of the integrators. The ratio of these sampling capacitors to output(feedback) capacitors will set the DCT/ DST kernel coefficients. The sampling capacitor may be changed, *dynamically* for each incoming sample realizing different coefficients required for the computation. Basic scheme is shown in Fig. 1. Based on  $i^{th}$  sample f(i) and  $k^{th}$  component C(k)[S(k)], desired input capacitor( $C_m$ ) is connected to the output capacitor. Thus gain of the block becomes proportional to  $C_m/C_o$ which may be switched to any of the N possible values. The entire computation is based on sum of weighted coefficients and each transform point(C(k)[S(k)]) is therefore generated by switching through the required set of capacitors. Therefore, for each of C(k)[S(k)], one  $M \times 1$ switch is needed provided it requires a summation of M distinct coefficients. From the properties of the DCT/DST matrix, it is found that all these distinct coefficients  $\cos(m\theta)$ s are required by all the C(k)[S(k)] concurrently for all the samples f(i) but with different permutation of m. So, an  $N \times N$  connects N input to N output capacitors (with amplifier), with different connectivity at different intervals. All N points of the DCT or DST thus can be computed using same capacitor array but with different capacitor switching sequence.

Now this philosophy is applied to a SC structure shown in Fig. 2. If we can switch among various sampling capacitors, an analog MAC can be realized which is described in the following sections.

#### 3.2. Integrator

As mentioned earlier, depending on i and k, capacitor  $C_m$  is selected and connected to the output for necessary addition/subtraction to/from  $H_{i-1}(k)$  depending on  $\delta_{ik}$ . N number of SC integrators have been realized in an innovative manner, to implement this operation. Its principle is same as of a typical SC integrator, however non-overlapping clocks( $\phi_1$  and  $\phi_2$ ) are switched via a 2 : 1 multi-



Modified Inverting/Non-inverting Integrator

plexer, using a control bit, to realize non-inverting/ inverting integration in the same circuit as shown in Fig. 2. For  $k^{th}$  integrator, it adds(subtracts) update terms to  $H_{i-1}(k)$  in non-inverting(inverting) mode expressed as:

$$H_i(k) = H_{i-1}(k) \pm (C_m/C_o) \cdot f(i); \pm \text{ integrator} \quad (12)$$

where,  $i, k = 0, 1, 2 \dots N - 1$ . When capacitor  $C_m$  is switched through  $S_{ik}$ , (12) can rewritten as,

$$H_i(k) = H_{i-1}(k) + \delta_{ik} \cdot [S_{ik}] [\cos\left(\frac{m\pi}{2N}\right)]^T \qquad (13)$$

Fig. 3 shows the schematic of SC integrator to be used in the design with necessary control signals. The MOS transistor(switch)  $M_2$ , is connected to one of the accumulators(output amplifier and  $C_o$ ). When N such accumulators  $(C_o)$ , are connected using N parallel, switches $(M_2)$ , each  $C_m$  can be connected any of the accumulators. A DE-MUX arrangement is shown here to understand the operation. This DEMUX, depending on the control input  $[S_{ik}]$ , will activate one of the parallel switch( $M_2$ ) by  $\phi_1$  leaving all other switches OFF. Thus a complete SC integrator is formed with input capacitors  $(C_m)$  and output capacitors  $(C_o)$ , through  $k^{th}$  switch  $M_2$ . Alternatively, N number of separate switches are also required to connect N input capacitors to a particular  $H_k$ . Therefore, two dimensional  $N \times N$  switches matrix is required to connect all N input capacitors to N output capacitors.

Capacitor values can be scaled to improve speed of the integrator, power dissipation, dynamic range and SC noise [6]. Lowering the capacitor value makes it sensitive to stray capacitances and KT/C noise, and in general will be dominated by sampling capacitor. On the other hand, power dissipation and chip area will increase if large capacitors are used. The dynamic range of the system can be improved by reducing the gain of the integrators thus by using output capacitors of  $kC_o$  (k > 1), we may increase the dynamic range with  $k^2$  times more area in capacitor layout. This will have another positive effect that, kernel will be more accurately matched to the desired value as matching of capacitors with larger area is always better. An alternate scheme to improve the dynamic range is by using an array of output capacitors in parallel, with one connected to the output at a time.. Computational accuracy of DC(S)T will primarily

depend on how realized capacitor ratio matches with the actual kernel coefficients. As modern CMOS processes offer capacitor matching of the order of 0.01%, which is sufficient for most signal processing applications. Capacitors may be implemented using gate-channel MOS capacitance. at lower cost with good linearity and matching[7]. A trade off, therefore has to be reached to balance the related parameters. In comparison to the architecture, described elsewhere [5], the speed and accuracy will be better with less area and circuit complexity as no external resistor-string is used.

Finally, generated output will be either C(k) or S(N - 1 - k) depending on  $\delta_{ik}$  for DCT/DST.

### 3.3. Switch Matrix

The cross point switch is the key block in the proposed design. It is a two-dimensional array of MOS transistors as shown in Fig 4. They are used such that each MOS effectively becomes the input switch of the accumulator. Therefore, by appropriate  $S_{ik}$  and  $\delta_{ik}$  sequence, N number of SC integrator connections are made for each sample and this is continued for all N input samples. From the property of the DC(S)T, it is found that all the  $C_m$ -s are required concurrently for all the C(k)/S(k). But it is to be mentioned again that DCT and DST computation requires different  $S_{i,k} \delta_{ik}$ values where one can be derived from other. The interesting feature of the switches in the switch matrix is that they are playing both the functions of a switch in the integrator as well as a connecting element between two capacitor array. Finally, when  $N = 2^{q}$ , computation each transform point requires  $2^{(q-1)}$  coefficients, and thus total number of switches P can be found by

$$P = 2 + 2^{2} + 2^{4} + \dots + 2^{2(q-1)} = (N^{2} + 2)/3$$

This is realized when k th transform points are grouped into  $\log_2 N$  groups, with  $k = 2^l \times (2j + 1)$  where  $j \in I$ ,  $l \in [0, q - 1]$ , with identical number of switches in each group, to realize compact structure with regular connection among switches leading to simplified controller.

### 3.4. Controller

The function of the digital controller is essentially to run the entire recursive computation including generation of  $S_{ik}$  as well as  $\delta_{ik}$ . It can be implemented using any finite state machine or memory lookup table[5]. The controller may be broken into two sub-controllers, one generating  $S_{ik}$  and other generating  $\delta_{ik}$ . Once  $S_{i,k}$  and  $\delta_{ik}$  are generated, they can be multiplexed with DCT/DST control to obtain C(k) or S(k). It has been already been pointed that  $S_{i,k}(DCT)=S_{i,N-1-k}$  (DST) Therefore,  $S_{i,k}$  sequence of DCT for C[k] can be used for computing S[N-1-k] of DST and hence



Fig. 4. Final DCT/DST Architecture [Controller Connections not Shown]

can be MUXed for DCT/DST. Similarly  $\delta_{ik}$  for DST becomes  $\delta_{ik}(\overline{\delta_{ik}})$  when *i* is even(odd), thus can be derived from  $\delta_{ik}$  of DCT. So, entire computation is performed thorough  $S_{i,k}$ ,  $\delta_{ik}$  program, which can be easily generated (using counters and combinational logic) and multiplexed with DCT/DST select. Alternatively, if the integrators are grouped and reordered, with *k* values {0,4}, {2,6} and {1, 3, 7, 5}, for N = 8, it is found that cross-point switches will have a regular interconnection pattern[8]. These interconnected switches in each group can be activated at regular intervals from simple counters and in such case no DEMUX is required. Now, by applying  $\delta_{ik}$  or  $(-1)^i \delta_{ik}$ , H(k) will produce either C(k) or S(N - 1 - k). Using similar principle, a different  $S'_{ik}$  and  $\delta'_{ik}$  will carry out the IDCT or IDST computation.

# 4. SIMULATION RESULTS

A basic schematic of the proposed final architecture is shown in Fig 4. A fully differential version of this circuit has been simulated using BSIM3V3 models in a  $0.18\mu m$  CMOS process. Two phase non-overlapping clocks operating at a frequency of 200KHz were used. The Opamps used for realizing the integrators had a fully differential topology with a differential input stage and class A output stage. The gain bandwidth(GB) of the Opamp was 10MHz with a load capacitance of 2pF with output resistance of 300K with  $V_{DD}$  = 1.8V. The basic advantage of differential topology is rejection of common mode noise like clock feed-through and charge injection. Potential sources of error are the finite gain of the Opamp (leading to a static error) and the finite GB, leading to a dynamic settling error. The dynamic range is bounded on the lower side by the noise floor and on the upper side by the integrator saturation limit.

In the first phase of simulation, the eigen vectors were used as inputs to check their orthogonality. The results were quite accurate with the non-zero term being about 90 dB higher than the others. For example in the case of DCT with a

| Table 1.                               |      |        |        |         |        |         |         |         |
|----------------------------------------|------|--------|--------|---------|--------|---------|---------|---------|
| Simulation Results for 8 Point DCT/DST |      |        |        |         |        |         |         |         |
| DCT                                    | C(0) | C(1)   | C(2)   | C(3)    | C(4)   | C(5)    | C(6)    | C(7)    |
| Calculated                             | 0    | 1      | 0      | -0.3516 | 0      | 0.2348  | 0       | -0.1990 |
| Simulated                              | 0    | 1      | 0      | -0.3512 | 0      | 0.2348  | 0       | -0.1992 |
| DST                                    | S(0) | S(1)   | S(2)   | S(3)    | S(4)   | S(5)    | S(6)    | S(7)    |
| Calculated                             | 1    | 0.2103 | 0.2776 | 0.1088  | 0.1823 | -0.0828 | -0.1546 | -0.0764 |
| Simulated                              | 1    | 0.2104 | 0.2764 | 0.1081  | 0.1797 | -0.0818 | -0.1525 | -0.0749 |

100 mV DC input, C(0) was 798mV while all others were less than  $25\mu$ V. The circuit was simulated for sinusoidal inputs of 20 Khz frequency, to compute the transform and its inverse. The results are given in Table 1 and found to be very close to the computed results. A plot of the errors between theoretical and simulated results is found to be less than 0.25% for most of cases. It should be mentioned that the normalized error shoots up for cases where coefficient values are very small.

# 5. CONCLUSION

In this paper, a new analog architecture is described for computing DCT and DST, and its inverse based on switchedswitched capacitor module. It's operation is digitally controlled, and multiplexing the control signals, DCT, DST and IDCT, IDST can be computed. The proposed structure is verified with preliminary simulation results. The architecture may be suitable for online DCT, DST computations.

### 6. REFERENCES

- A. K. Jain, "Fundamentals of Digital Image Processing," *Prentice Hall*, Englewood Cliffs, NJ: 1989
- [2] J. Chen, G. Shou and C. Zhou, "Digital-controlled analog circuits for weighted-sum operations," *IEICE Trans. Fundamentals*, vol. E82-A, pp. 2505-2513, Nov 1999.
- [3] S. Kawahito, M. Yoshida, M. Sasaki, K. Umehara *et.al*,"A CMOS image sensor with analog two-dimensional DCTbased compression circuits for on-chip cameras," *IEEE J. Solid-State Circuits*, vol. 32, pp. 2030-2041, Dec. 1997.
- [4] R.P.Aloe, J.F.D. Carrillo, E.S. Sinencio, J.M. Valverde *et.al*, "Programmable time-multiplexed switched-capacitor variable equalizer for arbitrary frequency response realizations," *IEEE J.Solid-State Circuits*, vol.32, pp.274-278, Feb 1997.
- [5] A. K. Mal and A. S. Dhar, "Sampled analog architecture for discrete Hartley transform," *IEEE TENCON-2003*, 14-17 th Oct, 2003, Bangalore, India,
- [6] A. Abo, "Design for Reliability of Low-voltage, Switchedcapacitor Circuits," *PhD Thesis, University of California,* Berkeley, May 1999.
- [7] H. Yoshizawa, Y. Huang, P. F. Ferguson, Jr., and G. C. Temes, "MOSFET-only switched-capacitor circuits in digital CMOS technology," *IEEE J. Solid-State Circuits*, vol. 34(6), pp.734-747, June 1999.
- [8] A. S. Dhar and S. Banerjee, "An array architecture for fast computation of discrete Hartley transform" *IEEE Trans. Circuits and Systems*, vol.38, No.9 pp.1095-1098, Sep 1991