# **INTERLEAVED TRELLIS CODED MODULATION AND DECODING FOR 10 GIGABIT ETHERNET OVER COPPER**

Yongru Gu and Keshab K. Parhi

Department of Electrical and Computer Engineering University of Minnesota, Minneapolis, MN 55455 Email: {yrg, parhi}@ece.umn.edu

## ABSTRACT

T) transceivers will use a 10-level pulse amplitude modulation (PAM10) as well as a 4D trellis code as in 1000BASE-T. Traditional trellis coded modulation scheme as in 1000BASE-

T leads to a design where the corresponding decoder with a long critical path needs to operate at 833MHz. It is difficult to meet the critical path requirements of such a decoder. To solve the problem, two interleaved trellis coded modulation schemes are proposed. The inherent decoding speed requirements are relaxed by factors of 4 and 2, respectively. Parallel decoding of the interleaved codes requires multiple decoders. To reduce the hardware overhead, time-multiplexed or folded decoder structures are proposed where only one decoder is needed and each delay in the decoder is replaced with four delays for scheme 1 and two delays for scheme 2, respectively. These delays can be used to reduce the critical path. Compared with the conventional decoder, the folded decoders for the two proposed schemes can achieve speedups of 4 and 2, respectively. Simulation results show that the error-rate performances of the two schemes are quite close to that of the conventional scheme.

#### 1. INTRODUCTION

10GBASE-T is the next generation high-speed Ethernet LAN. It will serve as a follow up to the Gigabit Ethernet over copper medium (1000BASE-T). Currently, the IEEE 802.3 10GBASE-T study group is investigating the feasibility of transmission of 10 Gbps over 4 unshielded twisted pairs [1].

Like 1000BASE-T, 10GBASE-T achieves 10 Gbps throughput with four wire pairs and eight transceivers (four at each end) with 2.5 Gb/s data rate, as shown in Fig. 1. According to the proposals presented in the IEEE 802.3 10GBASE-T study group, 10GBASE-T will probably use a PAM10 combined with a 4D trellis code as the basis for its transmission scheme. The symbol rate of this scheme is 833 Mbaud with each symbol representing 3 bits information. Coding 3 bits

of information requires only 8-level PAM constellation. The additional two levels are used for control signals as well as It is highly likely that 10 Gigabit Ethernet over copper (10GBASE the 4D trellis code in order to improve the performance of the 10GBASE-T transceivers.



Fig. 1. 10 Gigabit Ethernet over UTP

Figure 2 shows the block diagram of the 10GBASE-T transceiver, which is adapted from that of the 1000BASE-T transceiver in [2, 3]. Except for the PCS (Physical Coding Sublayer), the diagram shows only the processing blocks for one wire pair. The other three have a similar block diagram. The PCS block generates four 1D symbols, each representing 3 bits of data, for the 4 channels. Then the corresponding symbol for each channel goes through shaping filter, D/A converter, hybrid, and finally is coupled to the twisted pair wire.



Fig. 2. Block diagram of the 10GBASE-T transceiver

On the receive path, the received analog signal is first digitized using an A/D converter with 833 MHz. The output of the A/D converter is filtered by an adaptive feedforward equalizer (FFE) which performs channel equalization and precursor ISI (inter-symbol interference) cancellation. Echo from the transmitter corresponding to the same channel and NEXT from the adjacent channels are cancelled with respective adaptive cancellers. The joint decision feedback equalizer and trellis decoder (JED) is used to remove postcursor ISI and decode the trellis code.

Several approaches can be used to implement the JED. In presence of ISI and additive Gaussian noise, it is well established that the maximum-likelihood sequence estimation (MLSE), implemented by the Viterbi algorithm, can provide optimal performance in terms of bit error rate. However, the complexity of the algorithm is exponential with the sum of the channel memory length and the trellis code memory length. Thus it is highly desirable to reduce the complexity of the detection technique while retaining near optimal performance. One of the most powerful approaches for doing so is called parallel decision-feedback decoding. In this approach, an independent feedback signal is computed for each path in the Viterbi decoder, as the convolution of the sequence of symbols associated with that path, and the coefficients of the feedback filter of the decision feedback equalizer [4, 5].

If the trellis coded modulation is used in a conventional way as in 1000BASE-T, the decoding speed requirement for the corresponding parallel decision-feedback decoder referred to as T-PDFD (which means PDFD for the traditional encoding and modulation scheme) is 833 MHz. However, the implementation of the PDFD operating at 833 MHz is extremely challenging because of its long feedback loop in the decoder strcture.

In this paper, two interleaved trellis coded modulation schemes are proposed, which can relax the decoding speed requirement by factors of 4 and 2, respectively. Parallel decoding for scheme 1 requires 4 PDFDs (referred to as S1-PDFDs) while it requires two decoders (S2-PDFDs) for scheme 2. To reduce the hardware overhead, we also propose use of area-efficient time-multiplexed or folded decoders for the two schemes, referred as FS1-PDFD and FS2-PDFD, respectively. The critical path of the FS1-PDFD is only one fourth of that of an S1-PDFD while the critical path of the FS2-PDFD is only half of that of an S2-PDFD.

The idea of interleaving is similar to that in [6]. However, interleaving is exploited for 4 wires instead of one cable in [6]. In addition, either a folded or a parallel decoding structure can be used.

The rest of the paper is organized as follows. In section 2, the traditional trellis coded modulation and decoding scheme adapted from 1000BASE-T is reviewed. Section 3 describes the proposed interleaved trellis coded modulation and decoding schemes. Section 4 compares the performance in terms of error rate for the proposed schemes and the conventional scheme.

## 2. TRADITIONAL TRELLIS CODED MODULATION AND DECODING

In this section, first, the trellis coded modulation based on 1000BASE-T is described. Next, the straight-forward implementation of a T-PDFD and its critical path are reviewed.

The 4D trellis code is similar to the one in 1000BASE-T. The difference is that the symbol alphabet is now changed to  $\{-9, -7, -5, -3, -1, 1, 3, 5, 7, 9\}$ , and the two 1D subsets A and B become  $\{-9, -5, -1, 3, 7\}$  and  $\{-7, -3, 1, 5, 9\}$ , respectively. The formation of 8 4D subsets, S0 through S7, is similar to that in 1000BASE-T [2, 3], but each 4D subset now contains 1024 4D symbols, and accordingly, the number of parallel transitions for each state transition becomes 1024.

The trellis coded modulation can be used in a traditional way as in 1000BASE-T, where four wire pairs share a trellis encoder, as illustrated in Fig. 3. Each time, the encoder takes 12 bits of information and converts them to a 4D symbol at a speed of 833MHz. Each 4D symbol contains 4 1D symbols. The four 1D symbols are transmitted over the four pairs with one dimension per pair.



Fig. 3. Traditional encoding scheme

| Wire Pair 0 | Sample n+1,0  | Sample n,0  | -       |
|-------------|---------------|-------------|---------|
| Wire Pair 1 | Sample n+1,1  | Sample n,1  | Joint   |
| Wire Pair 2 | Sample n+1,2  | Sample n,2  | &       |
| Wire Pair 2 | Sample n+1,3  | Sample n,3  | Decoder |
| where an 5  | 4D Sample n+1 | 4D Sample n |         |

Fig. 4. Decoding scheme for the traditional encoding scheme

Fig. 4 shows the corresponding decoding scheme. To meet the throughput requirement, the joint equalizer and decoder (or the T-PDFD in this paper) needs to operate at 833 MHz. The architecture of the T-PDFD is shown in Fig. 5. It is similar to the one for 1000BASE-T, which is described in detail in [3]. The T-PDFD consists of a 1D BMU (branch metric unit), a 4D BMU, an ACSU (add-compareselect unit), an SMU (survivor memory unit) and a DFU (decision feedback unit). In the paper, we assume that the DFU has 20 taps. As shown in Fig. 5, all of them are inside a recursive loop which limits the throughput of the T-PDFD. As described in [3], the critical path of a 14-tap T-PDFD is one slicing operation, one squaring operation, 9 additions/subtractions, one 2-to-1 mux, two 4-to-1 muxes, one register delay, and one random logic. The critical path of the 20-tap T-PDFD is even longer. It is difficult for the



Fig. 5. Block diagram of the T-PDFD

T-PDFD to operate at 833MHz even with the latest CMOS technology.

One common approach to solve the bottleneck problem is to develop high speed PDFD designs as those for 1000BASE-T [3, 7, 8]. However, most of the proposed techniques may not be suitable for 10GBASE-T. For example, the decision feedback pre-filtering technique in [3] only works for channels where the postcursor ISI's energy is concentrated in the first one or two taps. Otherwise, it may result in significant performance loss. The complexity of [7] is exponential with channel memory length, so it is only suitable for channels with short memory length while the channel memory length of 10GBASE-T is substantially longer than that of 1000BASE-T. Based on look-ahead techniques, a pipelined PDFD is proposed in [8] which can achieve a speedup of around 2. But it may still not be fast enough as the time budget for each iteration in 10GBASE-T is only 1.2 ns.

An alternate approach is to change the encoding and modulation scheme such that the inherent decoding speed requirement for the decoder can be relaxed, as proposed in the next section.

#### 3. NEW TRELLIS ENCODING & DECODING SCHEME

Fig. 6 shows the proposed interleaved trellis coded modulation scheme 1. From the figure, we can see that each wire pair has its own encoder, and the encoding for different wire pairs is independent. In each iteration, the four encoders take 48 bit of information together with 12 bits per encoder at a speed of 208.3MHz. Each constituent encoder is the same as the one used in the traditional encoding scheme. The resulting 4 1D symbols of each 4D symbol go through a parallel-to-serial converter and are transmitted consecutively over the same wire pair.



Fig. 6. Proposed trellis encoding scheme 1

Since the encoding for each wire pair is independent, the

decoding for different pairs can also be independent. Thus four parallel JEDs (or S1-PDFDs in this paper) can be used with one S1-PDFD per pair as depicted in Fig. 7. Each S1-PDFD inputs 4 consecutive samples from the associated wire pair and gives an estimate of the current transmitted 4D symbol. The required decoding speed for each S1-PDFD is only 208MHz which means the critical path can be four times as long as that of the T-PDFD. The drawback of parallel decoding is the hardware overhead. Four S1-PDFDs instead of one decoder are needed now.



Fig. 7. Parallel decoding scheme for the proposed scheme 1

To reduce the hardware overhead of parallel decoding, a folded JED (or FS1-PDFD in this paper) can be used where the computations of the four parallel S1-PDFDs are time-multiplexed to a single S1-PDFD, and each delay in the S1-PDFD is replaced by four delays [9]. The silicon area comes down by a factor of 4 compared with the parallel decoding structure. The critical path can be reduced by a factor of 4 after retiming the additional delays. The clock speed can be increased by factor 4 to 833 MHz in order to maintain a throughput of 10 Gbps.

Note that the critical path of this design is one-fourth of the traditional design, even though both are operated at the same clock speed! However, there is dependence among four consecutively received samples over the same wire pair due to ISI. Thus, the branch metric unit in the S1-PDFD is more complicated than that in the T-PDFD. Consequently, the critical path of the S1-PDFD is longer than that of the T-PDFD, and the actual speedup of the FS1-PDFD will be around 3 instead of 4. Further speedup can be achieved if we combine some techniques such as look-ahead and precomputation technique as in [8].



Fig. 8. Proposed trellis encoding scheme 2

Fig. 8 shows the proposed interleaved scheme 2. It is similar to scheme 1. But here two wire pairs share a 4D trellis encoder and two 4D encoders are needed. The en-

coding speed for each encoder is 417 MHz. The resulting 4 1D symbols of each iteration of each encoder are transmitted over the associated two wire pairs with two dimensions per pair. The parallel decoding scheme is shown in Fig. 9, where two JEDs (S2-PDFDs) are needed with one decoder per two pairs. The decoding speed requirement is only 417 MHz. Like in scheme 1, a folded S2-PDFD (FS2-PDFD) can be used to reduce the hardware overhead. Theoretically, the FS2-PDFD can achieve a speedup of 2 if the critical path of an S2-PDFD is the same as that of the T-PDFD.



Fig. 9. Parallel decoding scheme for the proposed scheme 2

## 4. PERFORMANCE COMPARISON

Fig. 10 shows the symbol error-rate (SER) performances of the two proposed modulation schemes and the conventional scheme. For comparison, simulation results for uncoded PAM10 transmission with DFEs are also presented. In the simulations, CAT6 measurement data for the channel model from IEEE 802.3 10GBASE-T study group [1] is used. The postcursor channel memory length is assumed to be 20. From the figure, we can see the performances for the three schemes are quite close. The difference is within 0.5 dB. It is expected that all of them can achieve a coding gain of around 5 dB at an SER of  $10^{-10}$ .



**Fig. 10**. Error-Rate Performance Comparison for the Three Schemes

#### 5. CONCLUSION

In this paper, two interleaved trellis coded modulation schemes are proposed for 10GBASE-T, which can relax the inherent decoding speed requirements by factors of 4 and 2, respectively. Parallel decoding of the interleaved codes requires four decoders for scheme 1 and two for scheme 2. To reduce the hardware overhead, folded decoders are proposed where only a single decoder is needed. If the non-folded PDFDs for the proposed schemes have the same critical path as the T-PDFD, the FS1-PDFD and the FS2-PDFD can achieve speedups of 4 and 2, respectively. However, due to ISI, the complexity of the BMU of the S1-PDFDs and S2-PDFDs is increased, resulting in a longer critical path. Thus, the actual speedup may be around 3 for the FS1-PDFD, and 1.5 for the FS2-PDFD. Our current work involves reducing the complexity of the BMU. Further speedup is possible if we combine some other techniques such as look-ahead technique. Simulation results show that the error-rate performances of the proposed schemes are quite close to that of the conventional scheme. 

#### 6. REFERENCES

- [1] IEEE P802.3 10GBASE-T Study Group (http: //grouper.ieee.org/groups/802/3/10GBT/index.html).
- [2] M. Hatamian, et. al., "Design considerations for Gigabit Ethernet 1000Base-T twisted pair transceivers," *Proc. IEEE Custom Integrated Circuits Conference*, pp. 335-342, 1998.
- [3] E. F. Haratsch and K. Azadet, "A 1-Gb/s Joint Equalizer and Trellis Decoder for 1000BASE-T Gigabit Ethernet," *IEEE Journal of Solid-State Circuits*, vol. 36, no. 3, pp. 374-384, March 2001.
- [4] A. Duel and C. Heegard, "Delayed decision-feedback sequence estimation," *IEEE Trans. Commun.*, vol. 37, no. 5, pp. 428-436, May 1989.
- [5] M. V. Eyuboğlu and S. U. H. Quershi, "Reduced-state sequence estimation for coded modulation on intersymbol interference channels," *IEEE J. Selected Areas Commun.*, vol. 7, no. 6, pp. 989-995, Aug. 1989.
- [6] O. Agazzi, N. Seshadri and G. Ungerboeck, "10Gb/s PMD using PAM-5 trellis coded modulation" (http://grouper.ieee.org/ groups/802/3/ae/public/mar00/index.html).
- [7] E. F. Haratsch and K. Azadet, "High-speed reducedstate sequence estimation," in *Proc. of 2000 ISCAS*, vol. 3, pp. 387-390, May 2000.
- [8] E. F. Haratsch and K. Azadet, "A pipelined 14-tap parallel decision-feedback decoder for 1000BASE-T Gigabit Ethernet," in *Proc. of Technical Papers, 2001 International Symposium on VLSI Technology, Systems, and Applications*, pp. 117-120, April, 2001.
- [9] K. K. Parhi, VLSI Digital Signal Processing System Design and Implementation, John Wiley & Son, Inc., New York, 1999.