# LARGE-SCALE FPAA DEVICES FOR SIGNAL PROCESSING APPLICATIONS

Christopher M. Twigg, Paul Hasler, and David V. Anderson

Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, 30332-0250. ctwigg, phasler, dva@ece.gatech.edu

### ABSTRACT

We present a viewpoint showing that analog signal processing approaches are becoming configurable and programmable like their digital counterparts, while retaining a huge computational efficiency, for a given power budget, compared to their digital counterparts. We present recent results in programmable and configurable analog signal processing describing the widespread potential of these approaches. We also discuss issues with configurable systems, including size, power, and computational tradeoffs, as well as address the computational efficiency of these approaches.

Index Terms— FPAA, Reconfigurable Analog Processing

## **1. RECONFIGURABLE ANALOG OPPORTUNITY**

We describe the approach of Large Scale Field-Programmable Analog Arrays (FPAA) to provide a method for rapidly prototyping analog systems. Large-scale FPAAs are capable of computation and signal processing tasks similar to the size and complexity of FPGA devices, as illustrated in Fig. 1. DSP systems take tremendous advantage of programmability to develop and redevelop one IC for a wide range of applications well after the devices have been fabricated. Our approach considers programmable and configurable analog signal processing Although commercial analog / mixed signal design has not evolved like digital design, the opportunities for digital-like analog system design is enormous, if just to satisfy the analog interfacing demand created by the rapid growth of digital system design. As ICs scale to more advanced processes with ever increasing design costs, programmable designs are essentially required to be commercially viable. In most cases, the result is not an optimally designed solution, but a sufficient solution to rapidly address key consumer issues.

The potentially huge improvements in computational efficiency for analog systems over custom digital systems, first hypothesized by Mead [1], could give an incredible jump in system performance that translates to a wide range of power constrained, portable applications. Figure 5 shows



Figure 1: Comparison between Field Programmable Gate Arrays (FPGA) and Large-Scale Field Programmable Analog Arrays (FPAA). (a) FPGAs typically have a wide range of logic blocks and specialized blocks interconnected by routing with sufficient complexity for signal processing tasks. (b) Our discussion of Large-Scale FPAAs are similar, but now with analog components capable of similar DSP tasks.

the impact of this technology; if we have a programmable and configurable analog signal processing device, then we have a technology that will improve the power efficiency (computation for given power budget) by nearly the same amount as all of the digital improvements from transistor scaling since the early 1980s, as seen in Fig. 5, as well as the resulting trend identified by Gene Frantz [2]. Several analog signal processing computations, also shown in Fig. 5, show improved computational efficiencies of 1000 or more than their digital counterpart, verifying Mead's hypothesis [1]. Making programmable and configurable analog systems, like digital microprocessors, DSP ICs, and FPGAs, makes this comparison relevant because users have become accustom to utilizing programmable solutions. The resulting impact on applications is similar to the impact of digital scaling over the last two decades.

We presented recently the first Large-Scale FPAA, the Reconfigurable Analog Signal Processor (RASP) IC that utilizes 56 CABs and multi-level interconnect routing with various levels of analog computational granularity utilizing over 50,000 programmable analog elements [3]. In this paper, we discuss the signal processing opportunities utilizing these devices based upon the circuit foundation presented in [3].



Figure 2: Choosing between programmable analog and digital signal processing. (a) Scaling of DSP processors, the resulting fit over time (known as Gene's law), and the resulting jump (20 year leap) using analog signal processing. (b) Computational Efficiency Comparison (based on experimental results) of low-power signal processors to general analog signal processing algorithms, in units of MMAC/uW (= TMAC/W). The analog values (typical value) are compared to the digital signal processing value, normalized to 10MMAC/mW (best low-power digital processors currently available).

### 2. LARGE-SCALE FPAA DEVICES

Figure 3 shows the RASP block diagram and resulting die photo of the working IC. The IC is arranged in an 8 x 7 array of CABs, with two CAB versions, one that includes a 4x4 vector-matrix multiplication (VMM) placed along the top and bottom rows, and one that does not. We designed the arrays to have a mixture of analog granularity, so we have access to transistor-level functions as well as some higher signal processing features.

Programmable floating-gate circuit technology enables our FPAAs to provide area-efficient, accurately programmable analog circuitry that can be easily integrated into a larger digital/mixed-signal system [3,4]. Voltages can be programmed to a resolution of 13bit or higher, and currents can be programmed within 0.2% over current ranges less than 100pA to greater than 3uA at commercially usable speeds. These programmable elements can hold this



Figure 3: Our reconfigurable Analog Signal Processing IC. (a) Block Diagram of the RASP IC; the IC utilizes 56 CABs in a multi-level routing scheme. (b) Block diagram of the CAB components. (c) Die Photo of the RASP IC which consumes 3mm x 3mm area in 0.35µm CMOS process.

resolution over 10 years at room temperature [5]. The programmable elements allows for efficient switch elements that have a dual role as computational elements.

A range of compiled circuits and resulting signal processing systems were compiled on this FPAA; many can operate from frequencies ranging from 10Hz to low MHz. Internal bandwidths are greater than 50MHz, and measured noise levels are identical to equivalent custom circuits. Fig. 1 shows experimental data for an IC compiled as a bandpass filter and amplitude detector [4]. Figure 5 shows the schematic for compiling a low-power analog perceptual model block for an MP3 audio encoder. We have compiled multiple systems including constant Q filterbanks [3], vector-matrix multiplication, FIR filters, and acoustic null steering, and analog MP3 perceptual model; Table 1 shows range of signal processing circuits compiled on this FPAA in addition to a wide range of basic analog circuits. In operating mode, no external biases are required.

These Large-Scale FPAA devices can utilize a range of structures and interconnection schemes between the CABs,



Figure 4: RASP 1.5 IC. Compiled filterbank and envelope detector.

schemes that can be incorporated with other routing fabrics when required by a particular application. For example, the CAB elements can comprise electrical models of biological channels [6]. Using this 2D interconnection, one can program a wide range of dendritic morphologies utilizing nearest neighbor interconnects. A recent IC [6] generation achieves over 12,000 connections, which allows a detailed pyramidal cell implementation in  $4\text{mm}^2$  (0.35µm CMOS).

Table 1: Representative Compiled Circuits and the resulting number of CABs needed for successful implementation on the RASP 2.5 IC. Many of these circuits could be implemented together on a single RASP 2.5 IC.

| Circuit                                        | # of CABs |
|------------------------------------------------|-----------|
| Van-der-pol Oscillator                         | 4         |
| Ramp ADC                                       | 1         |
| Dendrite of n compartments with n synapses     | n         |
| Resonator filterbank of n stages               | 5n / 3    |
| Arbitrary Waveform Generator (length n)        | 2 n       |
| Analog Distributed Arithmatic (n bit pipeline) | 3 + n/3   |
| WTA (n inputs)                                 | n         |
| MP3 encoder (32 stages)                        | 64        |
| HMM classifier (n stages, m outputs))          | N + m + 1 |
| Algorithmic ADC                                | 4         |

An array of 4 by 4 RASP 2.5 ICs would have 4500 CABs resulting in a million analog parameters that could fit on a 1cm x 1cm 0.35um IC; on modern processes the number of CABs are significantly higher at a level similar to modern FPGA CLBs. Similar to the specialized computational processors found on FPGAs, we expect that specialized analog signal processing blocks can be integrated into the routing fabric and will provide additional performance improvements. We also expect that FPAA ICs will be specialized towards particular applications while retaining most of its generality for a wide range of applications. Finally, the integration of FPAAs with low power FPGA elements or similar low-power digital cores further opens the potential application space to mixed signal designs.

Figure 6 shows some basic ideas on tool flow that enables a wide range of user access to FPAA approaches. This



Figure 5: Compiled circuits for a perceptual model for an MP3 encoder.



Figure 4: Infrastructure for RASP 2.5 characterization and demonstrations. The approach allows us to schematically draw a circuit (shown in Xcircuit [7]), compile the circuit [8], and target the resulting RASP IC device using an FPGA board. We generate a netlist allowing SPICE level verification before targeting.

approach would combine a graphical interface (like Xcircuit [7]), targeting code (i.e. code developed at Georgia Tech [8]), and a MATLAB user interface to make the layers of abstraction transparent to a corresponding system or signal-processing engineer. The wide applicability of these elements will come as they are accessible to engineers near applications as opposed to only analog IC designers. This viewpoint opens a wide educational opportunity when utilizing these devices. During the conference presentation, we will demonstrate this working FPAA system by compiling and computing these typical DSP algorithms on the FPAA device.

Large-Scale FPAAs provide multiple advantages in terms of rapid development, similar to FPGAs, while retaining many of the benefits of custom analog IC design. If FPAA IC is defined, the time to develop an algorithm on such a chip, if it can be implemented, would be shorter than the typical analog IC design cycle, analogous the rapid prototyping common to digital FPGAs. Our FPAA approach can still scale with smaller modern process dimensions, utilizing the native multiple oxide thicknesses to enable long-term storage, while continually scaling the lateral device dimensions with the lithography. Unlike FPGAs where the routing and switching elements are not computing elements, in our FPAA devices, the switch elements are programmable to as computing elements (e.g. nonlinear power law elements, analog vector-matrix multipliers); therefore, an entire FPAA IC can be used for computation with clever design techniques. Although programmability and configurability comes usually with an associated increased area, the targeted area for an FPAA is comparable to custom analog ICs, because programming circuits offsets component area constraints due to device matching; therefore circuits are primarily tuned for desired SNR. FPAAs do not require significant power consumption compared to custom analog solutions, unlike the increase in power consumption of FPGAs compared to custom digital approaches. The amount that an FPAA can compress / refine the outgoing data, the lower resulting power required by the IC for transmission. The RASP has a measured bandwidth of a single switch of 400MHz. The bandwidth through a switch on a local line is 53MHz, and the bandwidth through a switch on a global line is 6.22MHz. The FPAA circuit will have the same frequency behavior as a custom circuit for analog circuits operating up to a few MHz on this IC because the bias currents are programmable to give the desired response. For a 90nm process, the switch matrix should handle circuits in 10-100MHz with similar behavior to custom ICs. The resulting bandwidth for parallel switches improves performance. High-speed architectures use multiple routing levels as well as buffer amplifiers for long connections.

### **3. ACKNOWLEDGEMENTS**

Authors wish to thank Faik Baskaya for help interfacing his FPAA compilation code for use with our devices.

#### REFERENCES

[1] C. Mead, "Neuromorphic electronic systems", IEEE Proceedings, vol. 78, no. = 10, 1990, pp. 1629-1636.

[2] G.~Franz, "Digital signal processor trends," *IEEE Micro*, vol. 20, no.~6, pp. 52--59, Nov--Dec 2001.

[3] C. Twigg and P. Hasler, "A Large-Scale Reconfigurable Analog Signal Processor IC," *IEEE CICC*, 2006.

[4] T. S. Hall, C. M. Twigg, J. D. Gray, P. Hasler, and D. V. Anderson, *IEEE CAS I*, Nov. 2005.

[5] V. Srinivasan, G. Serrano, J. Gray, and, P. Hasler, CICC 2005.

[6] E. Farquhar, C. Gordon, and P. Hasler, IEEE ISCAS 2006.

[7] Xcircuit introduction, http://opencircuitdesign.com/xcircuit

 [8] F. Baskaya, S. Reddy, S.K. Lim, and D.V. Anderson,
"Placement for Large-scale Floating-Gate Field Programmable Analog Arrays," IEEE Trans. VLSI, Vol. 14, No. 8, pp.906-910, 2006.