

# SIGNAL PROCESSOR IMPLEMENTATION OF DMT BASED VDSL MODEMS

ABDELMONAEM LAKHZOURI, MARKKU RENFORS

Telecommunications Laboratory, Tampere University of Technology

P.O.Box 553, FIN-33101 Tampere, Finland

[lakhzour@cs.tut.fi](mailto:lakhzour@cs.tut.fi)

## ABSTRACT

The Discrete Multi Tone (DMT) modulation is considered a viable technique for high-speed digital transmission over subscriber loop. It was selected as the standard for ADSL modems and it is also a strong candidate in the VDSL area.

This paper investigates a programmable solution for the DMT based VDSL modems using Texas Instrument processor family TMS320C62xx. A structure for both transmitter and receiver sides will be presented. Synchronization and equalization issues will be also discussed.

## 1. INTRODUCTION

Usual solutions for transceiver systems are based on ASIC design to reduce the cost, the power consumption and the complexity. With the increase of the performances of Digital Signal Processors (DSP), it has become possible to use programmable solutions based on DSPs while preserving these basic features. At the present time, most of voice band modems use a programmable solution. The advantage for such a platform is its flexibility. As the methods or the requirements advance, algorithms can be continually improved and updated which allows rapid upgrade. For the xDSL modems, the same trend is expected and the DSP based implementation becomes the challenge of new design. In this paper we will present a solution for DMT based VDSL modems and we will discuss the cost of its implementation on Texas Instruments processors TMS320C62xx.

DMT is a technique for high-speed data transmission over twisted pair cables [1]. The available bandwidth is portioned into a number of independent parallel sub-channels. Each of them is characterized by a Signal-to-Noise Ratio (SNR) that is measured whenever a connection is established and monitored thereafter. The source bit stream is encoded into a set of Quadrature

Amplitude Modulation (QAM) symbols. Each of them represents a number of bits determined as a function of the SNR of the associated sub-channel, the desired overall error probability and the target bit rate. Modulation is performed digitally using a complex-to-real Inverse Fast Fourier Transform (IFFT). Following the IFFT, a cyclic extension is added to mitigate inter-symbol interference and to guarantee orthogonality between signals. The resulting time domain samples are converted from digital to analog format to be transmitted over the medium [2].

At the receiver, after the analog-to-digital converter, the cyclic extension is stripped and the obtained data is applied to an M-point FFT block. The resulting signal is then equalized and decoded [3].

The paper is organized as follows. In Section 2, the proposed solution for DMT based VDSL transmitter will be presented. In Section 3, its implementation on TMS320C62xx will be evaluated. In Section 4, a structure of the DMT receiver will be proposed and discussed.

## 2. DMT BASED VDSL MODEMS

A DMT based VDSL modem signal contains  $N = N_c \times 2^n$  sub-carriers, where  $N_c = 256$  and  $n = 0, \dots, 4$ . The frequency spacing between two sub-carriers is fixed to 4.3125 kHz. For each sub-channel, up to 11 bits will be used depending on the corresponding SNR [3].

We denote by

$$\underline{a}_m = [a_m^0, a_m^1, \dots, a_m^{N-1}]^T$$

the vector composed of the  $N$  sub-channel symbols at the output of the constellation encoder respective to the  $m^{th}$  DMT symbol. The input vector to the IFFT block will be

$$\underline{b}_m = [b_m^0, b_m^1, \dots, b_m^{2N-1}]^T$$

---

This work was carried out in the project “Fast DSL technologies in broadband transmission” funded by the National Technology Agency of Finland (TEKES).

$$\text{with } \begin{cases} b_m^0 = b_m^N = 0 \\ b_m^k = a_m^k \\ b_m^{N+k} = \text{conj}(a_m^{N-k}) \end{cases} \text{ for } 0 < k < N. \quad (1)$$

The discrete output signal will have real-valued components  $\underline{c}_m = [c_m^0, c_m^1, \dots, c_m^{2N-1}]^T$

$$\text{where } c_m^k = \sum_{i=0}^{2N-1} b_m^k \exp(j2\pi ik). \quad (2)$$

The cyclic extension is formed by a cyclic prefix of length  $L_{CP}$  (the last  $L_{CP}$  samples of  $\underline{c}_m$ ) that will be inserted in front of the  $2N$  time domain output samples and a cyclic suffix of length  $L_{CS}$  (the first  $L_{CS}$  samples of  $\underline{c}_m$ ) that will be appended at the end.



Figure 1: The proposed solution for DMT transmitter

The proposed solution for DMT transmitter is shown in Figure 1. It is composed of 4 different devices: one FPGA (Field Programmable Gate Array) to handle the constellation encoding, a set of DSPs to manage the software processing of the modulation, one external memory to support buffering process between the output of the FPGA and the input of the DSP and one micro controller to manage the communication between the devices and to control the initialization of the modem. The number of DSP that will be used depends on the number of points of the FFT, i.e., the number of sub-channels used.

The architecture shown in Figure 1 has some influences on the choice of the micro-controller. It should include interfacing capability to various device types (DSP and FPGA), to support all the on-chip processing units and to support Interrupt servicing ability, acting as both source and target, for all the devices to be controlled. Also, it must have a sufficient number of I/O's to be able to communicate with all the units to be managed.

### 3. COMPLEXITY OF THE IMPLEMENTATION

The implementation of the DMT transmitter was done with a fixed-point processor TMS320C6201 at a clock rate of 200 MHz (5 ns per cycle). The chosen number of sub-channels is 512, so we will use 1024 point IFFT block. The length of the cyclic extension is 80 samples and the sampling frequency (Fs) will be 4.416 MHz [4]. The constellation encoding is performed by an FPGA to guarantee an efficient implementation. The implementation of the encoder on TMS320C62xx processor needs 154  $\mu$ s per DMT block, which is too high for VDSL modems. In this block, bit level operations are needed extensively, which makes the DSP based solution rather slow. On the other hand, this kind of operations can be implemented efficiently by an FPGA.

The software processing of the transmitter will be limited to three tasks described below

#### 3.1 Hermitian property creation

Both dc and Nyquist components will not be used. The creation of the vector  $\underline{b}_m$  was described in equation (1). The implementation of the hermitian property needs 1597 cycles per DMT block, which takes about 7  $\mu$ s total execution time in the case of 512 sub-channels.

#### 3.2 M-point IFFT

The second block in the DMT transmitter scheme is the  $M - IFFT$  block with  $M = 2 \times N$ . Two algorithms proposed by Texas Instruments Inc [5] were evaluated to see which is more efficient and suitable for DMT based VDSL modulation:

- Radix 2 with decimation in frequency:

In this case  $M = 2^k$  and the execution time is:

$$t_c = (2 \times M + 7) \log_2(M) + 9 + \frac{M}{4} \quad (3)$$

- Radix 4 with decimation in frequency:

In this case  $M = 4^k$  and the execution time is:

$$t_c = \left( \frac{10 \times M}{4} + 33 \right) \log_4(M) + 7 + \frac{M}{4} \quad (4)$$

The most efficient way is then to use the Radix 4 algorithm, but there is a problem when  $M$  is not a power of 4. In this case the following solution is proposed:

Instead of using one FFT block of size  $M$ , we will use two blocks radix 4 FFT of size  $N$ , with some further treatment. The two input vectors of size  $N$  each are given by:

$$\underline{x}_m = [x_m^0, \dots, x_m^{N-1}]^T \text{ and } \underline{y}_m = [y_m^0, \dots, y_m^{N-1}]^T$$

where

$$\begin{cases} x_m^k = b_m^k + b_m^{N+k} \\ y_m^k = (b_m^k + b_m^{N+k}) \times w_N^k \end{cases} \quad \text{for } k = 0, \dots, N-1 \quad (5)$$

and  $w_N^k = \exp(j \frac{2\pi k}{N})$

The implementation of the FFT algorithm with 512 sub-channels needs 13028 cycles, which takes about 65  $\mu$ s total execution time. The digit/bit reverse algorithm of the output data needed after the FFT algorithm uses an index table supposed to be on the chip memory of the DSP, already computed and stored during the initialization procedure.

### 3.3 Cyclic extension padding

The third block in the DMT transmitter scheme is the cyclic extension padding. In the case of 512 sub-channels, the Cyclic Extension (CE), i.e.,  $L_{CP} + L_{CS}$  is 80 samples. In this case, the total time needed for the implementation of this block is about 5  $\mu$ s corresponding to 1106 cycles.

As a result, the whole task of the processor takes 15731 cycles, which corresponds to 79  $\mu$ s. This total time is quite sufficient for a sampling rate of 4.416 MHz.

The evaluation of data memory needed for the implementation shows that the 64 Kbytes available on the chip are quite sufficient; just 13 Kbytes of data memory are needed.

Considering the actual performance of the DSPs used (TMS320C62xx processor family), We found that when the number of sub-channel exceeds 512, it's not possible to use only one DSP in the transmitter side. Independently of the number of sub-channels used, the DMT block length is always about 116  $\mu$ s.

**Table 1:** Evaluation of the implementation according to the number of sub-channels used.

| N    | Computational time         | Fs         |
|------|----------------------------|------------|
| 1024 | 39740 cycles: 199 $\mu$ s  | 8.832 MHz  |
| 2048 | 74191 cycles: 371 $\mu$ s  | 17.664 MHz |
| 4096 | 177022 cycles: 885 $\mu$ s | 35.328 MHz |

According to Table 1, up to 8 DSPs are needed for the modulation. The incoming symbols are divided by factor of 2 progressively according to the transformation presented in 3.2 until we reach a block size of 512, which will be handled with one DSP.

## 4. PROPOSED STRUCTURE FOR THE DMT RECEIVER

The DMT based VDSL modem receiver is presented in Figure 2. The main features that the receiver must implement are synchronization, demodulation, equalization, and de-mapping.

### 4.1 Synchronization

The symbol synchronization will be based on the exploitation of the cyclic extension property by using the correlation method. The correlation is done over CE samples between the received sequence and its shifted version (over 2N samples). The maximum indicates the symbol boundary [6][7].

For each window of 2N samples the following assumption was adopted to guarantee that only one DSP would be used for this function: the search of the correlation peak will be made over the first L samples, with  $L \leq N$ . Independently of the number of sub-channels used, the duration of 2N sample is 232  $\mu$ s. The implementation cost is shown in Table 2.

**Table 2:** The implementation cost of the synchronization.

| N    | Computational time         | L   |
|------|----------------------------|-----|
| 512  | 44206 Cycles = 221 $\mu$ s | 512 |
| 1024 | 45200 Cycles = 226 $\mu$ s | 360 |
| 2048 | 45405 Cycles = 227 $\mu$ s | 220 |
| 4096 | 44403 Cycles = 229 $\mu$ s | 130 |

If we require that the search of the correlation peak have to be done over the entire DMT symbol, more than one DSP will be needed in the case of  $N \geq 1024$ .

The timing error in sampling instants will be corrected in the frequency domain by rotation of the FFT output symbols. This was referred in [6] by the rotor property. The implementation will be incorporated in the frequency domain equalizer.

### 4.2 Demodulation

The demodulation process will be done with the same algorithms, as on the transmitter side, so it will have the same complexity. Depending on the number of sub-channels N, up to 8 FFT blocks will be used in parallel.

### 4.3 Equalization

Only frequency domain equalization (FEQ) was considered in this paper. T-taps Per tone scheme was adopted [8]. Originally, the equalization was done in two parts: T-taps time domain equalizer (TEQ) and 1-tap FEQ. The idea is to combine the TEQ and FEQ in one single frequency domain equalizer [9]. At first sight, T different

FFT blocks will be needed in the equalization scheme, but as it was shown in [8], only one block will be computed and the other T-1 blocks are deduced iteratively. The rotor property was implemented only on the computed FFT block.

We have shown that, independently of the number of sub-channels used, is possible to implement at least 1-tap FEQ (the case when we disregard the TEQ) with a single DSP. In Table 3, we give the maximum number of taps that could be used according to the number of sub-channels N.

**Table 3:** The implementation cost of the equalization and the Rotor Property

| N    | Computational time         | T |
|------|----------------------------|---|
| 512  | 41997 Cycles = 210 $\mu$ s | 5 |
| 1024 | 44521 Cycles = 222 $\mu$ s | 4 |
| 2048 | 42594 Cycles = 212 $\mu$ s | 3 |
| 4096 | 39403 Cycles = 197 $\mu$ s | 1 |

#### 4.4 Overall receiver

Figure 2 shows the proposed structure when  $N=512$ . Exactly 3 DSPs will be used together with one FPGA and one micro-controller. The startup procedure and the symbol synchronization will be done with the first DSP, the demodulation process will be done with a second DSP, and the frequency domain equalizer and maintaining the synchronization will be done with a third DSP. The demapping will be carried out with an FPGA and the entire system will be monitored by one micro-controller.



**Figure 2:** The proposed solution for DMT receiver

## 5. CONCLUSION

In this paper, we presented a structure for DMT based VDSL modem transceiver using TMS320C62xx processor family. We studied in details the complexity of both the transmitter and the receiver sides.

In the transmitter side, we showed the possibility of using only one DSP if N is no more than 512 and for the receiver side, we showed that 3 DSPs are needed for 512 sub-channels.

With new generation of faster processors (Texas Instruments TMS320C64x processors), it seems possible to implement the receiver with up to 2 DSPs even for higher number of sub-channels.

## 6. REFERENCES

- [1]: ANSI, “*Network and customer installation interfaces, Asymmetric Digital Subscriber Line (ADSL), Metallic Interface*,” T1.413-1998, 1998.
- [2]: J. Bingham, “*Multi-carrier modulation for data transmission: an idea whose time has come*, IEEE” Commun. Mag., vol.28, no.5, May 1990, pp. 5-14.
- [3]: Y. Chen, “*DSL, simulation techniques and standards development for digital subscriber line systems*”, Macmillan technology series, pp. 436-442, 1998.
- [4]: ETSI, “*Transmission and Multiplexing; access transmission systems on metallic access cables; Very high Digital Subscriber Line (VDSL), Part 2: transceiver spec.*” Draft technical spec, v0.0.9, Mars 2000.
- [5]: <ftp://ftp.ti.com/>, Proposed implementation of FFT algorithms.
- [6]: T. Pollet and M. Peeters, “*Synchronisation with DMT modulation*”, IEEE Commun. Mag. April 1999, pp. 80-86
- [7]: T. Pollet and M. Peeters, “*A new digital timing correction scheme for DMT systems combining temporal and frequential signal properties*”. ICC2000, pp. 1805-1808.
- [8]: K. Van Acker, G. Leus, M. Moonen, O. van de Wiel, T. Pollet.” Per tone equalization for DMT receivers”, GLOBECOM’99, 1999, Volume: 5. pp. 2311 –2315.
- [9]: T. Pollet, M. Peeters, M. Monnen, L. Vandendorpe, “*Equalization for DMT based broadband modems*”, IEEE Communications Magazine , Volume: 38 Issue: 5 , May 2000, pp. 106 –113.