## EFFICIENT ARCHITECTURE MAPPING OF FFT/IFFT FOR COGNITIVE RADIO NETWORKS

Guohui Wang<sup>1</sup>, Bei Yin<sup>1</sup>, Inkeun Cho<sup>2</sup>, Joseph R. Cavallaro<sup>1</sup>, Shuvra Bhattacharyya<sup>2</sup>, Jarmo Takala<sup>3</sup>

<sup>1</sup>Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA <sup>2</sup>Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, USA <sup>3</sup>Department of Pervasive Computing, Tampere University of Technology, Finland

#### ABSTRACT

Cognitive radio networks require flexibility to support a variety of wireless communication system standards. Many modern systems utilize some form of orthogonal frequency division multiplexing (OFDM) and single-carrier frequencydivision multiple access (SC-FDMA) often augmented with multiple input multiple output (MIMO) antenna schemes. A common module in these standards is the fast Fourier transform (FFT) and its inverse. Although many architectures exist for traditional power-of-two FFT lengths, the recent 3GPP LTE standards define non- power-of-two transform lengths. The various FFT and IFFT lengths for both the uplink and downlink processing require support for radix-2, radix-3, and radix-5 modules. In this paper, we propose a highly flexible FFT/IFFT architecture that can support a broad variety of transform sizes and efficient mapping to programmable testbed platforms for cognitive radio networks. This novel architecture will provide a range of transform sizes of the general form  $(2^n 3^k 5^l)$ , and for use in emerging algorithms for massive MIMO detectors.

*Index Terms*—Discrete Fourier transform, FFT/IFFT, 3GPP LTE/LTE-Advanced, VLSI architecture, pipelined architecture.

## 1. INTRODUCTION

Cognitive radio networks and software-defined radio (SDR) systems require flexibility to support a variety of wireless communication system standards. Flexibility and reconfigurability to support different system configurations with variable bandwidths and modulation schemes are also required in cognitive radio networks and SDR systems. Meanwhile, the emerging standards have been proposing higher data rates with advanced multiple-input multiple-output (MIMO) antenna technology. Therefore, it is challenging to design and implement reconfigurable fast Fourier transform (FFT) and inverse fast Fourier transform (IFFT) DSP modules to support high throughput. A large body of prior research exists on efficient design of FFTs for cognitive radio network and software-defined radio and applying the design

into specific platforms. Zhang et al. presented computationally efficient FFT based on the fact that zero values input and output could be huge portions of input and output in cognitive radio [1]. The presented scheme implemented in heterogeneous platforms and a specific processor. Liang and Huang proposed the mapping algorithm for parallel FFT into SmartCell, a coarse-grained reconfigurable architecture [2]. These researches focused on mapping of FFT into specific platforms, so they have a limitation that the device should use these architectures for cognitive radio. For generality of FFT implementation, the researches about fft architecture itself existed, and Yoshizawa, Nishi and Miyanaga presented parallel input and output based FFT processor design for cognitive radio systems [3]. It is limited in transform size of  $2^n$ , and transform size of non  $2^n$  FFT cannot be processed in this architecture. By considering that input of FFT is streamlined in wireless communication, the serial input and output based architecture could be a high throughput and resource efficient design.

This paper provides a novel perspective on the state-ofthe-art and emerging trends in mapping FFT/IFFT modules for cognitive radio networks. In this paper, we discuss the system performance requirements of 3GPP LTE-Advanced specifications and massive MIMO detection system, and presented a configurable and scalable FFT architecture based on a unified radix unit, which can cover radix-2, 3, 4, 5 and 7, to fulfill the system requirements.

#### 2. SYSTEM MODEL

The orthogonal frequency division multiplexing (OFDM) and single-carrier frequency-division multiple access (SC-FDMA, also called FFT-spread OFDM) technologies have been widely adopted in modern wireless communication systems, such as 3GPP long term evolution (LTE), IEEE 802.11n WiFi, and IEEE 802.16m WiMAX. The FFT and IFFT are essential DSP modules for these OFDM and SC-FDMA systems. In this section, we describe the system model of the 3GPP LTE/LTE-Advanced physical layer systems. We analyze the performance requirement of the FFT/IFFT modules,



**Fig. 1**. System diagram of 3GPP LTE/LTE-Advanced uplink transmitter and receiver.

**Table 1**. Bandwidth and resource configuration for LTE/LTE 

 Advanced uplink receiver.

| Bandwidth [MHz]         | 1.4 | 3   | 5   | 10   | 15   | 20   |
|-------------------------|-----|-----|-----|------|------|------|
| Num. of resource blocks | 6   | 12  | 25  | 50   | 75   | 100  |
| Occupied subcarriers    | 72  | 180 | 300 | 600  | 900  | 1200 |
| IFFT size (N)           | 72  | 180 | 300 | 600  | 900  | 1200 |
| FFT size $(M)$          | 128 | 256 | 512 | 1024 | 1536 | 2048 |

as well as the block lengths required to support different modes in these standards.

### 2.1. 3GPP LTE/LTE-Advanced Uplink

The SC-FDMA is employed in 3GPP LTE/LTE-Advanced wireless communication standard [4, 5, 6]. The system diagram of SC-FDMA is shown in Fig. 1. In the transmitter side, the data streams are coded and modulated based on the system configurations. The modulation symbols are processed with FFT transform precoding, which creates SC-FDMA symbols. The FFT spreading performs N-point FFT, where N is the scheduled bandwidth for the user equipment (UE) in terms of the number of subcarriers. The resulting SC-FDMA symbols are mapped to allocated resource elements along frequency first and then along symbol orders. Finally, SC-FDMA modulation is done by performing an M-point IFFT and adding cyclic prefix, where M is the number of the subcarriers for the deployed channel configurations. The receiver structure is the inverse of the transmitter.

The resource configuration and the size of the FFT/IFFT for different bandwidth used in LTE are listed in Table 1. A unified reconfigurable FFT/IFFT module designed for cognitive radio and SDR systems should support all the sizes of FFT/IFFT listed in Table 1. Only few lengths shown in Table 1 are power-of-two, while others have factors 3 and 5. Table 2 shows number of radix-2, radix-3 and radix-5 stages by factorizing the FFT/IFFT sizes for the LTE/LTE-Advanced standards.

In the 3GPP LTE-Advanced standard, 100MHz bandwidth can be used with carrier aggregation technology. With 100MHz bandwidth, a maximum data rate of 1.5 Gb/s for uplink is specified with 64-QAM modulation scheme and  $4 \times 4$  MIMO antennas for each 20MHz bandwidth. Therefore, to meet this throughput requirement, the *M*-point IFFT

 Table 2. Configurations for of radix-2, radix-3, radix-5 stages

 for different FFT/IFFT block lengths in LTE/LTE-Advanced.



**Fig. 2**. Base-station system diagram of LTE/LTE-A uplink for 20MHz bandwidth, with large-scale MIMO technology.

should run at more than 250 MS/s (mega symbols per second). Accordingly, the *N*-point FFT should operate at at least 427 MS/s to achieve the maximum throughput requirement.

# 2.2. Large-Scale MIMO System for 3GPP LTE/LTE-A Uplink

Large-scale MIMO is an emerging technique for wireless communication recently proposed by [7]. The system equips the base station (BS) with a large number of the antennas to serve a small number of users at same time in the same frequency band. Compared to the conventional small scale MIMO system, large-scale MIMO can potentially improve the spectral efficiency and link reliability. Besides these, large-scale MIMO system has potential to apply lowcomplexity detection methods in the uplink, and to reduce the power consumption and hardware costs in the base station [7, 8, 9]. Furthermore, this technique can be easily applied to the LTE-A system.

In the large-scale MIMO LTE-A uplnk, each user first encodes the information bit by using a channel coding encoder. Then the encoded bits are mapped to symbols s on the constellation points in the set  $\mathcal{O}$ . The symbol vector  $\mathbf{s} = [s_1, \ldots, s_U]^T$  with  $\mathbf{s} \in O^U$  contains the symbols for all Uusers. At each user, these symbols are first converted to the frequency domain using FFT, then mapped on to the corresponding subcarriers allocated to the each user, and then converted back to the time domain by using IFFT. The signal is then transmitted over the wireless channel. The received signal at BS can be modeled as  $\mathbf{y} = \mathbf{Hs} + \mathbf{n}$ , where  $\mathbf{y} = [y_1, \ldots, y_B]^T$  corresponds to the received vector, B is

**Table 3.** Aggregate throughput performance requirement for FFT/IFFT for 3GPP LTE-A uplink receiver and large-scale MIMO systems (100MHz bandwidth).

| MIMO           | Throughput requirements                                                                 |                                                                                                                |  |
|----------------|-----------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------|--|
| scheme         | FFT                                                                                     | IFFT                                                                                                           |  |
| $4 \times 4$   | 427 MS/s                                                                                | 250 MS/s                                                                                                       |  |
| $128 \times 8$ | 13.65 GS/s                                                                              | 500 MS/s                                                                                                       |  |
|                | $     MIMO \\     scheme \\     \hline         4 \times 4 \\         128 \times 8     $ | MIMO<br>scheme         Throughput<br>FFT           4 × 4         427 MS/s           128 × 8         13.65 GS/s |  |

the number of antennas at BS,  $\mathbf{H} \in \mathbb{C}^{B \times U}$  is the uplink channel matrix (where  $B \gg U$ ), and  $\mathbf{n} \in \mathbb{C}^{B}$  models additive noise at the BS; the entries of  $\mathbf{n}$  are assumed to be i.i.d. zeromean Gaussian with variance  $N_0$ .

The received signal first is converted into frequency domain using FFT. After subcarrier de-mapping and equalization, the signal are then converted back to time domain by using IFFT. As mentioned, according to LTE-A standard, the throughput of FFT and IFFT should support 427 MS/s and 250 MS/s, respectively. With large-scale MIMO setting, for example,  $128 \times 8$  system (128 antennas at the base-station side and 8 antennas at the UEs), the requirement for aggregate throughput of FFT and IFFT becomes 13.65 GS/s and 500 MS/s. Especially for the FFT, this throughput equals to using 32 FFT modules, each FFT running at 427 MS/s. This fact challenges the design in both area and throughput for the real system. To sum up, the throughput performance requirements for FFT and IFFT are listed in Table 3.

### 3. ADVANCED FFT/IFFT ARCHITECTURES FOR COGNITIVE RADIO NETWORKS

In wireless communication systems employing the OFDM and SC-FDMA schemes, FFT and IFFT are indispensable digital signal processing (DSP) modules. Traditionally, power-of-two lengths are defined for the FFT/IFFT in the DSP systems, in which only radix-2 FFT kernels are required for VLSI implementations. However, non-power-of-two lengths have been employed in recent standards. As is shown in Table 1, the 3GPP LTE and LTE-Advanced standards specify IFFT of sizes 128, 256, 512, 1024, 1536, and 2048, and FFT of sizes 72, 180, 300, 600, 900, and 1200 in the uplink, respectively [4, 5]. To support these FFT/IFFT lengths, radix-3 and radix-5 FFT stages are necessary in addition to the radix-2 FFT kernels.

#### 3.1. Existing IFFT/FFT architectures for OFDM system

There are numerous implementations for power-of-two IFFTs and FFTs, and various architectures were developed in wireless communication systems [10, 11, 12, 13, 14]. These architectures are based on efficient implementation of radix-2 and radix-4 computations and architectures supporting several transforms lengths have been introduced. Recently implementations supporting non-power-of-two IFFT/FFT sizes have also been reported. Altera FFT IP MegaCore Function, an FFT architecture for wireless communication system is described, which was developed as intellectual property blocks for FPGA circuits. This supports FFT sizes of  $2^n 3^k$  where  $k \in \{0,1\}$ . The support for multiple-of-three is realised with a 512-point FFT core and a separate radix-3 computation unit. Integration of radix-3 unit is realised with the aid of additional buffer memories. Another architecture is Xilinx LogicCORE IP Discrete Fourier Transform. This is more flexible solution and supports transform sizes of  $2^n 3^k 5^l$ . The architecture contains a single butterfly unit, which supports computation of radix-2, radix-3, and radix-5 butterflies. Computation of radix-5 butterfly takes two cycles while the other butteries are computed in one cycle. Unfortunately, the previous solutions are not scalable, i.e, the performance of the system is fixed and cannot be improved other than configuring two units in parallel and executing two independent FFT streams in parallel. Such an approach is not area-efficient, in particular from the memory point of view. These architectures are area-efficient, but it has a disadvantage in long latency and excessive memory for processing FFT at run-time. In [15], a serial input, serial output FFT pipeline is proposed which combines radix-2, and radix-3 computations and requires only (N-1) memory elements for N-point transform. This architecture supports the FFT sizes in 3GPP LTE specification but misses the IFFT sizes. However, this architecture is static, so it cannot support multiple number of points at run-time. The architecture can be configurable in number of points at compile-time, so it has an scalable issue, too.

# **3.2.** Requirement for IFFT/FFT performance in 3GPP LTE system

The baseband processing in a wireless communication system calls for high processing rate based on stream processing. When selecting an architecture for such systems, it is essential that the architecture is scalable such that the throughput and latency requirements can be fulfilled with a minimum set of computing resources. In particular, there are often differences on amount of storage needed for intermediate results and twiddle factor coefficient. Quite often ping-pong buffers are used, i.e., for an N-point transform, 2N memory locations are reserved.

As baseband processing in a wireless communication system are naturally stream processing, a pipelined architecture would be suitable for FFT computations in receivers. To overcome this problem, we proposed a scalable FFT architecture with unified radix butterfly structure as shown in figure 3.

#### 3.2.1. Proposed architecture timing requirement

Serial input and output architecture requires  $2 \cdot (2^n 3^k 5^l)$  cycles in processing radix operation for transform size of  $2^n 3^k 5^l$ . Each multiplication for twiddle factor between radix



Fig. 3. Proposed architecture for FFT.

unit needs 2 cycles, and the required cycles for multiplication is 2(n + k + l). For 1200-point IFFT case, it requires 2414 cycles. The required throughput of 3GPP LTE uplink is 250 MS/s as shown in Table 3. For 1200-point IFFT, 1200 data symbols should be processed within 4.8  $\mu$ s. Defining each clock period as  $t_{clk}$ , the required  $t_{clk}$  can be calculated as follows:

$$t_{clk} = \frac{4.8}{2414} \times 10^{-6} = 1.988 \times 10^{-9} s, \tag{1}$$

and the maximum frequency  $f_{max}$  is

$$f_{max} = \frac{1}{t_{clk}} \doteq 500 \, Mhz. \tag{2}$$

# 3.2.2. Unified radix unit, and scalability and configurability of IFFT/FFT architecture

To support scalable and configurable FFT architecture, Qureshi, Garrido and Gustafsson presents a unified structure for radix-2, 3, 4, 5, and 7 based on Winograd Fourier transform [16], and our unified radix unit is based on this structure. Unified radix unit allows a system to configure the number of points easily at run-time, and it can give the system flexibility for supporting multiple transform sizes of FFT within a single architecture. Also, configurable size of memory can be connected with this radix unit depending on number of points of FFT. Between radix units, the configurable twiddle factor generation unit is connected, and it can be implemented in two different methods, which are memory based twiddle factor and CORDIC. Memory based twiddle factor method enhances the accuracy of system, however, it brings about excessive memory usage and memory access issue for supporting multiple number of points in FFT. On the other hand, the CORDIC based method could be easily scaled to support multiple number of points of FFT. It has a disadvantage in complexity compared to memory based twiddle factor, so user should select a suitable method depending on system requirements. For supporting multiple number of points in FFT, the architecture has bypasses to connect the required radix units for configuring FFT. This architecture can process Npoint FFT with N-1 memory and it could support multiple number of points of FFT within a single architecture.

#### 4. CONCLUSION

In this paper, we have discussed the issue of designing high performance FFT/IFFT for cognitive radio communication and network systems, and presented novel design for a configurable FFT/IFFT module to provide scalability and reconfigurability. Specifically, unified radix structure for radix-2, 3, 4, 5, and 7 is proposed. The bypasses between radix units and configurable memory of each radix unit enable scalability to support a variety of communication system specifications with multiple transform sizes of FFT/IFFT. Our scalable and configurable FFT/IFFT architecture utilizes streamlined processing, and it also could be well suited for nature of data generation in wireless communication systems.

### 5. REFERENCES

- Q. Zhang, A. B. J. Kokkeler, and G. J. M. Smit, "An efficient FFT for OFDM based cognitive radio on a reconfigurable architecture," in *Proc. IEEE ICC*, 2007, pp. 6522–6526.
- [2] C. Liang and X. Huang, "Mapping parallel FFT algorithm onto smartcell coarse-grained reconfigurable architecture," in *Proc. IEEE ASAP*, 2009, pp. 231–234.
- [3] S. Yoshizawa, K. Nishi, and Y. Miyanaga, "Reconfigurable two-dimensional pipeline FFT processor in OFDM cognitive radio systems," in *Proc. IEEE ISCAS*, 2008, pp. 1248–1251.
- [4] 3rd Generation Partnership Project; Technical Specification Group Radio Access Network; Evolved Universal Terrestrial Radio Access (E-UTRA); Multiplexing and channel coding (Release 9). 3GPP Organizational Partners TS 36.212 Rev. 8.3.0, May 2008.
- [5] 3rd Generation Partnership Project; Technical Specification Group Radio Access Network; Evolved Universal Terrestrial Radio Access (E-UTRA); Physical Layer Procedures (Release 10). 3GPP Organizational Partners TS 36.213 version 10.10.0, Jul. 2013.
- [6] G. Wang, B. Yin, K. Amiri, Y. Sun, M. Wu, and J. R. Cavallaro, "FPGA prototyping of a high data rate LTE uplink baseband receiver," in *Conference Record of the 43rd Asilomar Conference on Signals, Systems and Computers*, Nov. 2009, pp. 248–252.
- [7] T. L. Marzetta, "Noncooperative cellular wireless with unlimited numbers of base station antennas," *IEEE Trans. Wireless Commun.*, vol. 9, no. 11, pp. 3590–3600, Nov. 2010.
- [8] H. Q. Ngo, E. G. Larsson, and T. L. Marzetta, "Energy and spectral efficiency of very large multiuser MIMO systems," *arXiv preprint: 1112.3810v2*, May 2012.

- [9] B. Yin, M. Wu, C. Studer, J. R. Cavallaro, and C. Dick, "Implementation trade-offs for linear detection in largescale MIMO systems," in *Proc. IEEE ICASSP*, Vancouver, BC, May 2013, pp. 2679–2683.
- [10] S. He and M. Torkelson, "Designing pipeline FFT processor for OFDM demodulation," in URSI Int. Symp. Signals, Systems, and Electronics, 1998, pp. 257–262.
- [11] Y.-W. Lin and C.-Y. Lee, "Design of an FFT/IFFT processor for MIMO OFDM systems," *IEEE Trans. Circuits and Systems I: Regular Papers*, vol. 54, no. 4, pp. 807–815, 2007.
- [12] H. Jiang, H. Luo, J. Tian, and W. Song, "Design of an efficient FFT processor for OFDM systems," *IEEE Trans. Consumer Electronics*, vol. 51, no. 4, pp. 1099–1103, 2005.
- [13] T. Sansaloni, A. Perez-Pascual, V. Torres, and J. Valls, "Efficient pipeline FFT processors for WLAN MIMO-OFDM systems," *IET Electronics Letters*, vol. 41, no. 19, pp. 1043–1044, 2005.
- [14] I. Cho, C.-C. Shen, Y. Tachwali, C.-J. Hsu, and S. S. Bhattacharyya, "Configurable, resource-optimized FFT architecture for OFDM communication," in *Proc. IEEE ICASSP*, 2013, pp. 2746–2750.
- [15] I. Cho, T. Patyk, D. Guevorkian, J. Takala, and S. Bhattacharyya, "Pipeline FFT for wireless communications supporting 128-2014 / 1536 -point transforms," in *Proc. IEEE GlobalSIP*, Dec. 2013, pp. 1242–1245.
- [16] F. Qureshi, M. Garrido, and O. Gustafsson, "Unified architecture for 2, 3, 4, 5, and 7-point DFTs based on Winograd Fourier transform algorithm," *IET Electronics Letters*, vol. 49, no. 5, pp. 348–349, 2013.