# VITERBI DECODER FOR HIGH-SPEED ULTRA-WIDEBAND COMMUNICATION SYSTEMS

Jun Tang and Keshab K. Parhi

Dept. of Electrical and Computer Engineering University of Minnesota email: {tangjun, parhi}@ecc.umn.edu

# ABSTRACT

The ultra-wideband (UWB) communication systems have attracted both academic research and commercial interests due to their potential high throughput and precise ranging capability. Convoluitonal codes are widely used in different proposals for the high-speed UWB communication systems. In this paper, a novel Viterbi decoder architecture is studied for the multiband orthogonal frequency division multiplexing (MB-OFDM) UWB system [1]. In the Viterbi decoder, sliding block, 2-step lookahead and 2 parallel techniques are combined to achieve the highest desired data rate. For lower data rates, it is possible to disable some parts of the decoder for power saving by proper analysis of the effects of puncturing on word length and trace back length. At the same time, in add-compare-select (ACS) unit, pipelined most-significant bit (MSB) first ACS unit is also utilized to shorten the length of the critical path.

## 1. INTRODUCTION

The ultra-wideband (UWB) communication systems have been widely studied for commercial usage. Multiple systems have been proposed for high-speed wireless personal area network (WPAN). In these systems, convolutional codes are widely selected as the channel codes to correct the transmission errors. Specifically, a multiband orthogonal frequency division multiplexing (MB-OFDM) based system widely supported by numerous manufacturers, uses puncture convolutional codes to provide different error correcting capabilities and data rates with different puncturing patterns [1].

For the convolutional codes, the Viterbi decoder is widely used to achieve near optimal decoding performance. The decoder architecture has been widely studied. In [2], a systolic array solution with lookahead techniques was studied. With M-step lookahead, the throughput of the Viterbi decoder can be increased by M. However, the number of the branches compared in computing a state metrics increases exponentially, as M increases. The complexity and power consumption of the decoder increase significantly if M is large. Thus, in most of the modern designs, M is limited to smaller than 3. In [3], a sliding block architecture was explored to implement high-speed Viterbi decoder with decoding speed over 1Gbps. The decoding in the sliding block decoder is performed simultaneously in forward and backward direction. However, in [3], only a 4-state Viterbi decoder is implemented, while for practical 32- or 64-state Viterbi decoder, the complexity of the decoder will be extremely high due to the fully parallel ACS units and large number of skew buffer registers. Inside the Viterbi decoder, the feedback loop in the add-compare-select (ACS) unit imposes the bottleneck on the decoding speed. How to shorten the critical path in the ACS unit has been widely studied. In [4], a retiming scheme for the most significant bit (MSB) first ACS unit was analyzed in detail to achieve minimal length of the critical path.

Besides these prior work on the architecture of Viterbi decoders, relative little work has been published on punctured convolutional codes. In this paper, other than combining sliding block, lookahead, and retimed MSB-first ACS unit architectures to achieve the highest desired data rate, some more analysis is also performed on the punctured codes. The punctured pattern may affect the required word length and trace back length. The analyzed results are utilized in the design of the decoder to reduce the power consumption in lower operating data rates.

### 2. DECODER ARCHITECTURE

### 2.1. Punctured Convolutional Codes in MB-OFDM

In the MB-OFDM system, the input to the decoder is soft information from the frequency domain equalizer. To achieve different data rates and operate on different distances, a rate-1/3 convolutional code with constraint length, K = 7, is punctured to provided rate-1/2, 5/8 and 3/4 codes. The relation of codes and achievable data rates are summarized

| uata rates |      |       |              |          |     |
|------------|------|-------|--------------|----------|-----|
| Code       | 1/3  | 11/32 | 1/2          | 5/8      | 3/4 |
| Rate       |      |       |              |          |     |
| Data Rate  | 53.3 | 110   | 80, 160, 320 | 200, 400 | 480 |
| (Mbps)     |      |       |              |          |     |

 Table 1. Relation between convolutional code rates and data rates

in Table 1. The puncturing patterns are explained in [1]. From the table, the output data rates change significantly from 53.3 to 480Mbps. Due to the large variation in the data rates and code rates, it is desirable to design a Viterbi decoder architecture to both support the highest data rate with high hardware efficiency and reduce power consumption in lower data rates. In this paper, we propose a decoder architecture which can be flexibly applied to the different code rates.

### 2.2. Word Length and Trace Back Length

A Viterbi decoder is composed of 3 basic computation units: branch metrics unit (BMU), add-compare-select (ACS) unit and survival memory unit (SMU).

The branch metric unit (BMU) calculates the branch metrics  $\lambda_{i,j}(n)$ , which is the metric of the branch from state  $s_i$  to state  $s_j$  at time instance n. The branch metrics are fed into the ACS unit to calculate the state metrics  $\gamma_j(n + 1)$  for state  $s_j$ . At time instance n + 1, the state metrics can be computed recursively using:

$$\gamma_j(n+1) = \min_{\forall \lambda_{i,j}(n)} \left\{ \tilde{\gamma}_{i,j}(n+1) \right\} = \min_{\forall \lambda_{i,j}(n)} \left\{ \lambda_{i,j}(n) + \gamma_i(n) \right\}$$
(1)

A state metrics is the minimal one of intermediate state metrics (ISM),  $\tilde{\gamma}_{i,j}(n + 1)$ , which are the summations of the branch metrics,  $\lambda_{i,j}(n)$ , and the state metrics,  $\gamma_i(n)$ , at the previous time instance. The SMU processes the decisions made in the ACS and outputs the decoded data.

The word length of in each of the processing element in the ACS unit can be determined by maximum dynamic range  $\Delta_{\text{max}}$  of the state metrics [5], which is bounded by

$$\Delta_{\max} \le \lambda_{\max} \log_2 N,\tag{2}$$

where N is the number of states and  $\lambda_{\text{max}}$  is maximum branch metric for the trellis. And the word length of state metrics  $L_{\text{ACS}}$  can be computed through

$$L_{\text{ACS}} = \left\lceil \log_2(\Delta_{\text{max}} + k\lambda_{\text{max}}) \right\rceil + 1, \quad (3)$$

where k is the number of lookahead steps [6].

With some bits punctured from the original rate-1/3 codes, the value of  $\lambda_{\text{max}}$  decreases correspondingly, because these punctured bits do not provide any information to the decoder and they can be treated as zero soft inputs. Some simulation results (as summarized in Table 2) indicate that the

# Table 2. Relations of word length in the ACS unit and the code rates

| Code Rate   | 1/3 | 11/32 | 1/2 | 5/8 | 3/4 |
|-------------|-----|-------|-----|-----|-----|
| Word Length | 16  | 16    | 14  | 13  | 12  |

| Table 3. Relations of trace back lengths and the code rates |                   |     |       |     |     |     |  |
|-------------------------------------------------------------|-------------------|-----|-------|-----|-----|-----|--|
|                                                             | Code Rate         | 1/3 | 11/32 | 1/2 | 5/8 | 3/4 |  |
|                                                             | Trace Back Length | 7   | 7     | 10  | 12  | 14  |  |

difference among the different puncture patterns and code rates can be as large as 3 to 4 bits with 4-bit soft inputs and 2-step lookahead. As much as 25% saving on power consumption in the ACS units can be achieved if the non-used bits (the most significant 4 bits) are disabled for rate-3/4 codes.

Having been widely used in Viterbi decoder hardware implementations, the path merging property of Viterbi decoder makes it possible to trace back with a finite length of trace back memory. Meanwhile, to achieve negligible performance loss, the trace back length in the SMU also varies according to the code rates. Table 3 summarizes the simulated trace back length required for different code rates.

According to the analysis on the word lengths in the ACS units and trace back lengths for different code rates, it is possible to save power consumption by dynamically disabling some of the bits in the ACS unit and memory in the SMU.

# 2.3. Combined Lookahead, Sliding Block and Parallel Processing

Several techniques have been developed to increase the decoding speed with a relatively low clock frequency. These techniques include lookahead [2], sliding block [3] and parallel processing [7]. As the lookahead level increases, the complexity of ACS units increases exponentially and the complexity of the parallel processing increases linearly as the parallelism increases. The increase of the implementation complexity makes the chip not only more expensive but also more power-consuming. Thus, the level of lookahead and parallelism must be limited to the minimum extent. The sliding block technique can be used to increase the throughput of the decoder by performing the decoding process simultaneously in forward and backward direction. However, in [3], only a 4-state Viterbi decoder is implemented. As the number of the states, the constraint length and trace back length increase in real applications, the numbers of required ACS units and skew buffer registers becomes prohibitively high. For the highest 480Mbps decoding speed in MB-OFDM, it is feasible to replace the fully parallel ACS architecture to a serial architecture by utilizing the folding technique and replace the skew buffer registers following ACS units with trace back memory.



Fig. 1. The proposed architecture.

The proposed decoder architecture, which combines 2step lookahead, 2-level parallelism and serialized sliding block techniques, is illustrated in Fig. 1. The whole decoder is divided into 3 blocks. Block 1 and 2 are combined as the first branch of the parallel decoding and block 3 is the second branch of the parallel decoding. The two branches of the parallel architecture share the same trace back memory through time multiplexing. Blocks 2 and 3 are enabled/disabled according to the required output data rate.

### 2.3.1. 400 – 480 Mbps

In the highest speed operating modes, all of the 3 modules are enabled. The two parallel branches decode the continuous data stream alternatively as illustrated in Fig. 2(a). With the start-time difference between the two branches, it is possible to let them share the same trace back memory as shown in Fig. 3. With the time-multiplexing technique, 6 single-port memory units with size  $L_M = 0.5L_{sp}$  ( $L_{sp}$  is the survival path length) are combined together to form the trace-back memory, where half of them are for the decoding in forward direction and the other half are for backward direction. In Fig. 3, only the forward direction decoding is illustrated while the backward direction is symmetric. Only one read or write operate is performed by either branch on any single-port memory unit at any given time instance.

### 2.3.2. 110 - 320 Mbps

In this data rate scope, the second branch (block 3) is disabled to save power. The architecture is then reduced back to the serialized sliding block Viterbi decoder with trace back memory of  $L_{\text{M.total}} = 6L_{sp}$ . In this case, the decoding process on a continuous data stream is illustrated in



**Fig. 2**. Continuous stream processing for different data rate: (a) Sliding block decoding with 2 branches and (b) sliding block decoding with single branches.

Fig. 2(b).

### 2.3.3. Below 110 Mbps

In this data rate scope, not only the second branch is disabled but also the backward processing ACS unit in the first branch (block 2) is disabled. At the same time, the soft input buffer memory is also disabled. In this case, the decoder is working in the continuous mode. The trace back memory is then reorganized to three parts for writing new data, traceback read and decode read as explained in [8].

According to the above analysis and Table 2, the second parallel branch only works for code rate greater than 5/8. The ACS units in the second branch can have a word length of 13 or 12. While the ACS units in the first branch need to work in all supported code rates, they have to support largest word length 16, while some of the bits can be disabled when the decoder operates in high code rates to same power consumption.

## 2.4. Structures of ACS Unit and Minimum State Estimate Unit

The ACS unit design in the Viterbi decoder for MB-OFDM system utilizes the retimed MSB-first architecture, as illustrated in Figure 4. In the structure, the ACS unit is composed of adder, code converter (CC) and minimum selec-



**Fig. 3**. Time multiplex usage of trace back memory with highest output data rates (400 and 480 Mbps)



Fig. 4. Bit-wise structure of Retimed MSB-first ACS unit

tion (MS). The delay elements between MS in Figure 4 are carefully placed according to the analysis in [4].

The minimum state estimate unit (MSEU) operates at the junction of the forward and backward decoding to find the state with minimal state metrics before the traceback procedure starts. The MSEU can be treated as an modified ACS unit because they perform the same operations: add and compare. However, in the MSEU, with constraint length 7, we need to compare 64 state metrics which much more that compared in ACS units. Directly expanding the MS unit in the ACS unit to 64-state-metric comparison, the complexity of MS unit would be prohibitively high. Meanwhile, different from the ACS unit, there is no feed back loop in the MSEU because its outputs are directly fed to the SMU. A 3-level MS unit for the MSEU is proposed. At the first level, the 64 states are compared in 16 groups of 4 states. The 16 outputs of the first level are compared at the second level also in 4 groups of 4 states. At the final step, the 4 outputs of the second level are fed into the last level of the MS unit to obtain the state with the minimal metrics.

The two parallel branches of the decoder share the same MSEU in a time-multiplex manner to reduce the hardware complexity. At the same time, for data rates smaller lower than 110Mbps, the MSEU can be used to accelerate the convergence speed in SMU and boost the BER performance of the decoder.

#### 3. CONCLUSION

In this paper, after analyzing the effects of the puncture patterns on word length in the ACS unit and trace back length in the SMU, a Viterbi decoder architecture is proposed to both provide highest data rate of 480 Mbps and save power in lower data rates for MB-OFDM UWB wireless communication systems. The architecture combines the sliding block, 2-step lookahead and 2-level parallelism techniques. The trace back memory and the MSEU are shared by both parallel branches to reduce hardware complexity.

### 4. REFERENCES

- A. Batra, J. Balakrishnan, A. Dabak, R. Gharpurey, J. Lin, P. Fontaine, J.-M. Ho, S. Lee, M. Frechette, S. March, and H. Yamaguchi, "Physical layer submission to 802.15 task group 3a: Multi-band orthogonal frequency division multiplexing," Sept. 2004.
- [2] G. Fettweis and H. Meyr, "High-rate Viterbi processor: A systolic array solution," *IEEE Journal on Selected Areas in Communications*, vol. 8, no. 8, pp. 1520–1534, Oct. 1990.
- [3] P. J. Black and T. H.-Y. Meng, "A 1-Gb/s, four-state, sliding block Viterbi decoder," *IEEE Journal of Solid-State Circuits*, vol. 32, no. 6, pp. 797–805, June 1997.
- [4] K. K. Parhi, "An improved pipelined MSB-first addcompare select unit structure for Viterbi decoders," *IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications*, vol. 51, no. 3, pp. 504–511, Mar. 2004.
- [5] A. P. Hekstra, "An alternative to metric rescaling in Viterbi decoders," *IEEE Transactions on Communications*, vol. 37, no. 11, pp. 1220–1222, Nov. 1989.
- [6] P. Black and T. Meng, "A 140Mb/s 32-state radix-4 Viterbi decoder," in *IEEE International Solid-State Circuits Conference*, Feb. 1992, pp. 70–71.
- [7] G. Fettweis and H. Meyr, "Parallel Viterbi algorithm implementation: Breaking the ACS-bottleneck," *IEEE Transactions on Communications*, vol. 37, no. 8, pp. 785–790, Aug. 1989.
- [8] G. Feygin and P. Gulak, "Architecture tradeoffs for survivor sequence memory management in Viterbi decoders," *IEEE Transactions on Communications*, vol. 41, no. 3, pp. 425–429, Mar. 1993.