# RECONFIGURABLE ARCHITECTURE FOR ULTRASONIC SIGNAL COMPRESSION AND TARGET DETECTION

Erdal Oruklu, Guilherme Cardoso, and Jafar Saniie

Electrical and Computer Engineering Department Illinois Institute of Technology Chicago/IL

# ABSTRACT

In this paper, we present a reconfigurable ultrasonic processor which can simultaneously employ two different ultrasonic imaging applications: ultrasonic target detection and signal compression. The underlying hardware design makes use of the fact that both applications share the same algorithm fundamentals. A unified architecture implements signal decomposition and reconstruction with forward and inverse DCT and DWT transforms. After the forward transform step, a thresholding operation is applied to discriminate either frequency bands for target detection or coefficient amplitudes for data compression. The flexibility and the modular design make this reconfigurable architecture an effective and practical solution for real-time ultrasonic imaging applications.

### **1. INTRODUCTION**

Ultrasonic target detection and classification in the presence of high scattering noise (clutter) is a significant and challenging problem. Another challenge for a realtime ultrasonic imaging application is the large amount of data that must be processed for image formation and/or image transmission for remote analysis by experts. Consequently, it is desirable to use data compression techniques to reduce data while maintaining the signal integrity, and to facilitate the analysis and remote access of ultrasonic information through wireless or wired communication channels, or computer networks. In this paper, we present a reconfigurable architecture for ultrasonic signal compression and target detection in high scattering noise. Our design is based on the development of a run-time configurable architecture which provides increased flexibility and adaptability. In addition, this architecture manages the high computational load of realtime applications while minimizing area and power consumption.

Target detection algorithms are based on the premise that clutter echoes exhibit randomness and are more sensitive to frequency shifts than target echoes [1]. Therefore, frequency diverse signal decomposition such as discrete cosine transform (DCT), and discrete wavelet transform (DWT) can be used for differentiating the target information from the clutter echoes. These transforms can also be beneficial to ultrasonic signal compression due to their energy compaction properties. We have designed a reconfigurable architecture that can carry out both DWT and DCT for subband decomposition of the ultrasonic data. The DWT component of this architecture is based on the lifting scheme [2]. The lifting scheme requires 2-4 times fewer arithmetic operations than conventional filter convolution architecture for DWT. The DCT component uses a recursive architecture which can be realized using simple IIR filters to process the data [3,4]. This architecture is especially suitable for large number of data points.

Section 2 describes the algorithms used for frequencydiverse ultrasonic target detection which involves subband decomposition and frequency band selection for target-toclutter ratio enhancement. Section 3 presents the performance of the data compression algorithm for both DWT and DCT based methods. The reconfigurable architecture for both detection and compression is presented in Section 4.

#### 2. ULTRASONIC TARGET DETECTION

Figure 1 shows the components of the ultrasonic target detection algorithm. An ultrasonic measuring system handles data acquisition. DWT or DCT can be utilized to decompose the digitized ultrasonic signal into subbands and provide frequency-diverse representation [5].



Figure 1. Ultrasonic Target Detection System

The task of the target detection algorithm is to select a number of windows in order to decorrelate the clutter echoes. Here, a window represents a band-pass filter for the DCT method. For the DWT method, a window is a group of wavelet scales. Inverse transform is applied to each window operation and the resulting time-domain signals are then fed into a post-processor. The post processor in the final stage is a decision block that reconstructs the time-domain signal from the incoming channels according to order statistics rules [1]. The selection of the window is governed by a priori knowledge of the bandwidth of the transducer and the sampling rate. The frequency bands that likely carry target echo information are incorporated in these windows. The windowing operation (i.e. frequency band selection) is implemented in hardware via a thresholding block that is comprised of a comparator and sequencer. The wavelet kernel used in wavelet analysis is also a factor that affects the frequency decomposition performance. Implementing only one kernel in hardware prohibits target detection in different environments. Therefore, several wavelet kernel configurations are supported in this architecture in order to improve the system adaptability. Smooth wavelet kernels are more successful in energy compaction [5]. Signal energy compaction is essential for isolating the target echo information in different frequency bands. Figure 2 shows the experimental results for improved target detection using DCT, DWT, and the minimization algorithm [1] for post-processing. The target represents a defect inside a steel block. The data length, N, is 1024 points and sampled at 100 MHz. Both DCT and DWT perform well and improve SNR ratio by 10 dB when the input SNR is close to 0 dB.



Figure 2. Target Detection Results

#### 3. ULTRASONIC SIGNAL COMPRESSION

Data compression is the process of obtaining a more efficient representation of a signal; consequently, in ultrasonic imaging applications it is desirable to use data compression techniques to reduce the data size while maintaining the signal integrity. In this section the data compression performance of the DWT kernels Haar and Daub20 (Daubechies with 20 coefficients), DCT, and Walsh-Hadamard transform (WHT) is analyzed. WHT is included for comparison purposes since it is simple to

implement in hardware. For benchmarking, an ultrasonic Gaussian echo,  $\exp(-\alpha t^2 + i2\pi f_c)$  where *a* is the bandwidth factor and  $f_c$  is the center frequency, is used to examine the performance of different data compression algorithms. For a Gaussian echo, the 98% bandwidth (BW) that contains 98% of the signal energy is given by BW =  $0.382\sqrt{\alpha}$ . Normalizing this BW by the center frequency, NBW =  $0.382 \sqrt{\alpha}/f_c$ , allows the differentiation between narrowband and broadband echoes. Hence, a narrowband echo has a small NBW while a broadband echo has a large NBW. The compression performance of the DCT, DWT, and WHT as a function of the ultrasonic echo bandwidth (i.e., NBW) is shown in Figure 3. This figure shows the total energy of the 5 most energetic transform coefficients. All signals are 512 16-bits samples long. For a broadband signal (NBW= 0.5) the DWT Daub20 outperforms the DCT and WHT, as the DWT coefficients are able to recover over 90% of the signal energy. The DCT outperforms the DWT and WHT for narrowband signal (NBW= 0.2). Therefore, WHT although simple to realize is not a choice for implementation.



Figure 3. Relation between NBW and the five most energetic transform coefficients.

Data compression performance of DWT depends on the wavelet kernel and its compactness properties. The data compression performance of six different wavelet kernels is shown in Figure 4.



Figure 4. Energy accumulated among the five most dominant coefficients of the DWT using the following kernels: a) Haar, b) Daubechies, c) Beylkin, d) Coiflet, e) Symmlet, and f) Vaidyanathan.

This figure shows how much energy is concentrated in the five most dominant coefficients of the DWT as a function of the bandwidth of the ultrasonic signal. These results indicate that the Daub20 wavelet kernel has the best data compression performance, while the Haar wavelet kernel has the worst data compression performance.

# 4. A UNIFIED ARCHITECTURE DESIGN

The repetition rate in ultrasonic imaging systems dictates the processing time for real-time applications. For such systems, a typical value for a repetition rate is 1000 Hz resulting in 1ms time intervals for processing the acquired data. Figure 5 shows the timing requirements for a typical application. Data acquisition takes approximately 10µs (considering 1024 samples acquired at 100 MHz sampling rate). Consequently, the target detection system has to process the data, store the results and either display, transmit or store the processed results in 990µs.





Figure 5. Timing requirements

Multi-purpose design of the ultrasonic processor demands a reconfigurable architecture capable of realizing both target detection and data compression algorithms. This architecture should facilitate subband decomposition through DCT or DWT and apply adaptable thresholding methods for both applications.



Figure 6. System Components

Figure 6 shows the main system components of the ultrasonic processor architecture. The input memory holds the ultrasonic data acquired with the experimental measuring system. This ultrasonic data are fed into the forward transform block which operates under the directive of the control logic block. The forward transform block implements DCT and DWT cores. The intermediate results are stored in a buffer. If the desired operation mode is data compression, then a hard threshold is applied to the coefficients amplitudes by a sequencer block. Since a

major portion of the transform coefficients are below the threshold value, this results in a significant reduction of data size. If the application goal is target detection, the intermediate results are processed by the sequencer which in this case selects certain desirable frequency channels or wavelet scales. This thresholding operation discriminates those subbands where target information is dominant and they are selected for signal reconstruction (up to 3 scales for DWT and 8 channels for DCT have been used). The inverse transform block uses the same hardware resources as the forward transform block; however they are reconfigured for multi-channel operation. The post processing block applies orders statistics methods such as minimization. The outcome is stored in the output memory to be transmitted or displayed. The control logic block is initialized by the user input. The control logic block is designed to perform the following tasks:

- Determine type of the operation which can be either data compression or target detection;
- Select the type of the transform kernel for forward signal transform.
- Select the filter coefficients (i.e. wavelet kernel) for DWT.
- Re-allocate hardware resources for inverse transform channels.

The main challenge for this unified architecture is the design of the processing elements (PEs) which are the building blocks of the transform cores. A PE can be programmed for a specific wavelet kernel or DCT operation. Arrays of PEs can be cascaded together for concurrent execution. Each PE consists of dedicated datapath elements such as a multiplier, an adder, registers and multiplexers (See Figure 7). This particular PE design is optimized for wavelet implementation based on the lifting scheme [2] and DCT implementation based on recursive structures [3]. One PE is used for one lifting step and two PEs are required for recursive implementation of DCT. Both the lifting scheme and recursive DCT methods are especially suitable when transform size N is large.



Figure 7. Processing Element (PE) architecture

# 4.1. DWT implementation using the lifting scheme

The advantage of the lifting scheme is the parallel implementation of the wavelet stages (subband decompositions). If enough PEs are available, the stages can be pipelined and all the decomposition stages are executed in parallel with minimum latency. Figure 8 shows a forward DWT implementation. For this particular wavelet kernel (Daubechies-4), four lifting steps are required. Therefore, four PEs are utilized in one array to implement a single wavelet stage. A PE array completes one stage of computations, stores the results and restarts the next stage using half of the new results (only low-pass data). For a 1024 point transform, ten stages have to be completed for all the wavelet coefficients. Using more than one PE array improves the throughput by simultaneous processing of the different stages. Interconnection bus between PEs allows the size of PE arrays to be flexible. Therefore, wavelet kernels that require different numbers of lifting steps can be implemented by reconfiguring the PE array structure in the transform block [6].



Figure 8. Forward DWT implementation using multiple PE arrays

#### 4.2. Recursive DCT implementation

Parallel implementation of fast DCT algorithms is impractical when the transform is very large. An alternative method is using recursive structures [3,4]. These recursive implementations have very regular VLSI structures and they are especially suitable for large transforms (N>128). They use simple IIR filtering for obtaining DCT coefficients. Figure 9 shows the IIR filters used for even and odd coefficients.

For an input signal x[n], DCT coefficient y[k], when k is even, is given as [3]:

$$y[k] = \sqrt{\frac{2}{N}} E_k \cdot (-1)^{k/2} \cdot g_{N/2-1}(k)$$

where  $E_k = \sqrt{1/2}$  for k=0 and  $E_k = 1$  for  $k \neq 1$ , and

$$g_{j}[k] = \sum_{n=0}^{j} w_{k}[j-n]\cos(n+\frac{1}{2})\theta_{k}$$

and

$$w_k[n] = x[n] + (-1)^k x[N-1-n]$$

When k is odd, y[k] is given as:

$$y[k] = \sqrt{\frac{2}{N}} E_k \cdot (-1)^{k-1/2} \cdot h_{N/2-1}(k)$$

where

$$h_j[k] = \sum_{n=0}^{j} w_k[j-n]\sin(n+\frac{1}{2})\theta_k$$



Figure 9. Recursive structures for DCT [3]. a) even coefficient b) odd coefficient computation.

Although sequential operation is not as fast as parallel, the hardware requirements and power consumption are significantly reduced. The computation of DCT coefficients is independent from each other. Therefore, the system throughput can be improved by introducing more IIR units into the system. Two programmable PEs are used to implement each IIR structure in Figure 9. For the inverse transform channels, all the available PE resources can be allocated to one channel or these resources can be distributed to each channel for parallel execution. Therefore, the configurability of the architecture plays an important role in enhancing the throughput of DCT realization.

#### 5. CONCLUSION

In this paper, a reconfigurable architecture for ultrasonic signal compression and target detection has been presented. A unified hardware implementation is made possible since both of the algorithms are designed to share subband decomposition logic and adaptable thresholding. This architecture is a flexible and efficient solution for real-time ultrasonic imaging systems where low-power and compactness are critical.

#### 6. REFERENCES

- J. Saniie and D.T. Nagle, "Analysis of order statistic CFAR threshold estimators for improved ultrasonic flaw detection", *IEEE Trans. on Ultrasonics, Ferroelectrics and Frequency Control*, vol. 39, no. 5, pp. 618-630, September 1992.
- [2] W. Sweldens, "The lifting scheme: A custom-design construction of biorthogonal wavelets", *Appl. Comput. Harmon. Anal.*, pp. 186-200, 1996.
- [3] C.Chen, B. Liu, J. Yang, and J. Wang, "Efficient recursive structures for forward and inverse discrete cosine transform", *IEEE Trans. on Signal Processing*, vol. 52, no. 9, pp. 2665-2669, September 2004.
- [4] J. Yang, and C. Fan, "Recursive cosine transforms with selectable fixed-coefficient filters", *IEEE Trans. on Circuits* and Systems-II, vol. 46, no. 2, pp. 211-216, February 1999.
- [5] E. Oruklu, and J. Saniie, "Ultrasonic flaw detection using discrete wavelet transform for NDE applications", *IEEE Symposium on Ultrasonics*, August 2004.
- [6] H. Liao, M. K. Mandal, and B. F. Cockburn, "Efficient architectures for 1-D and 2-D lifting based wavelet transforms", *IEEE Trans. on Signal Processing*, vol.52, no.5, pp. 1315-1326, May 2004.