# A NOISE-ROBUST ECHO CANCELLER ON V830 MULTIMEDIA RISC PROCESSOR INTEGRATED INTO A CAR NAVIGATION SYSTEM

Yutaka Hiratani, Akihiro Hirano† and Masaya Kanazawa

System Software Department, NEC IC Microcomputer Systems,Ltd 3-484, Tsukagoshi, Saiwai-Ku, Kawasaki, Kanagawa 210, Japan †C&C Media Research Laboratories, NEC Corporation

1-1, Miyazaki 4-Chome, Miyamae-Ku, Kawasaki, Kanagawa 216, Japan

# ABSTRACT

This paper presents a noise-robust, fast-convergence echo canceller and its implementation on a multimedia RISC (Reduced Instruction Set Computer). Faster convergence is achieved by introducing an improved noise power estimator for step-size control. This echo canceller has been implemented on V830 multimedia embedded RISC and has been integrated into a car navigation system. V830 provides performance comparable to a digital signal processor (DSP) and extended flexibility while power consumption is lower than that of a DSP. Computer simulations and measurements using a V830 board show fast convergence and robustness against disturbance such as a noise and a double-talk without double-talk detection.

# 1. INTRODUCTION

For mobile telephones, hands-free capability with an echo canceller is essential to both safety drive and comfortable conversation. To overcome serious influence of a noise and a double-talk which hands-free echo cancellers commonly encounter, noise-robust algorithms have been studied [1-3]. Among them, an adaptive step-size algorithm based on the reference input signal power and the estimated noise power [1] is one of the most attractive candidates by its robustness and simplicity. However, its convergence speed might be slow when the step size became too small.

Another requirement for mobile hands-free echo cancellers is the cost. A lower price and smaller size for the echo canceller are important. Sharing a hardware with other systems such as audio equipment and a car navigation system appears promising. Such a multi-purpose system should have both signal processing capability and flexibility as a general-purpose computer. A multimedia embedded RISC (Reduced Instruction Set Computer) [5] suffices these requirements.

This paper presents a fast-convergence noise-robust echo canceller and its implementation on a multimedia embedded RISC. The conventional adaptive step-size algorithm and its problem on noise-power estimation for step-size control will be reviewed in Section 2, followed by a new noise-power estimator. Section 3 describes an implementation of the proposed algorithm on V830 multimedia embedded RISC processor. The comparison of computational cost with an implementation on a DSP (Digital Signal Processor), and integration into a car navigation system are also shown. Computer simulations and



Fig. 1. Step-size in adaptive step-size algorithm.

measurements using a V830 board show the performance of the proposed algorithm.

# 2. NOISE-ROBUST STOCHASTIC GRADIENT ALGORITHM WITH ADAPTIVE STEP-SIZE

#### 2.1. Basic Algorithm

In the stochastic gradient algorithm with an adaptive stepsize[1], the filter coefficient vector  $\mathbf{W}(t)$  is updated by

$$\mathbf{W}(t+1) = \mathbf{W}(t) + \mu(t)e(t)\mathbf{X}(t)$$
(1)

where *t* is the time index,  $\mathbf{X}(t)$  is the reference input signal vector, and e(t) is the error signal. The step-size  $\mu(t)$  is calculated by

$$\mu(t) = \frac{\mu_0 P_X(t)}{P_X^2(t) + P_{tb}^2(t)}.$$
(2)

When the reference input signal power  $P_X(t)$  becomes larger, the step-size  $\mu(t)$  increases until  $P_X(t)$  reaches a threshold  $P_{th}(t)$ . For  $P_{th}(t) < P_X(t)$ ,  $\mu(t)$  becomes smaller. The threshold  $P_{th}(t)$  is controlled using the estimated noise power  $P_N(t)$  by

$$P_{th}(t) = \alpha P_N(t). \tag{3}$$

Figure 1 demonstrates step-size control in this algorithm. To avoid influence of the residual echo on the noise power estimation, the estimated noise power is updated only when the echo is absent.  $P_N(t)$  is updated by

$$P_N(t+1) = \beta P_N(t) + (1-\beta)e^2(t)$$
(4)

when the error signal power is grater than the echo replica power.

This algorithm is robust against additive noise and requires only a few additional computations to Normalized LMS



Fig. 2. Block diagram of V830.

algorithm[4]. However, the convergence speed of this algorithm is slow in the beginning of the adaptation. When adaptation starts, the filter coefficients are zero. Until the coefficients grow, the echo replica power is almost zero. Small echo replica power suggests an absence of an echo. The noise power estimator always updates the estimates even when an echo exists. A noise power estimate which includes the echo power will be much larger than the actual power. Therefore, larger estimated noise power causes smaller step size and slow convergence.

#### 2.2. Improved Noise Power Estimation

In order to avoid influence of immature filter coefficients, the reference input signal power is used to detect the absence of the echo. When the reference input signal power  $P_X(t)$  is less than a second threshold  $P_0$ , the estimated noise power  $P_N(t)$  is updated by (4). If  $P_0 < P_X(t)$ , i.e., the echo may exist, the latest P(t) is held. Thus, more precise estimation of the additive noise power can be achieved by smaller number of computations than that of the conventional algorithm.

# 3. IMPLEMENTATION ON RISC MICROPROCESSOR

#### 3.1. RISC vs. DSP

For a low-cost and compact mobile hands-free echo canceller, sharing a hardware with other systems such as audio equipment and a car navigation system looks promising. As a controller for a car navigation system, low cost DSPs are not good choice. For such DSPs, handling byte-size data and accessing a large memory require extra instruction cycles or special hardware. Though RISC processors support such operations, some low-cost RISCs are not equipped with a fast multiply-accumulator (MAC) which is necessary for signal processing applications. A multimedia embedded RISC with a fast MAC[5] is a good candidate for a car navigation system with a hands-free echo canceling capability.

### 3.2. V830 Multimedia Embedded RISC Microprocessor

V830[5] is an embedded RISC microprocessor with extensions suitable for signal processing applications. Figure 2 shows the brief block diagram of V830. A MAC capable of

Tab. 1. Comparison with DSP

| Name                   | V830        | μPD77018A  |
|------------------------|-------------|------------|
| Туре                   | 32-bit RISC | 16-bit DSP |
| Clock [MHz]            | 100         | 52         |
| MIPS                   | 118         | 52         |
| Power consumption [mW] | 400         | 555        |
| Instruction/tap        | 7.5         | 3          |
| MIPS for EC(*)         | 26          | 9.5        |
| CPU time for EC(*) [%] | 22.4        | 18.3       |

(\*) 8kHz sampling, 300tap echo canceller (EC).

32-bit  $\times$  32-bit + 32-bit calculation accelerates execution of signal processing applications such as an FIR filtering, an FFT and a DCT. The MAC can start its operation on each instruction cycle and generates a result after two instruction cycles. The onchip instruction and data RAMs can store programs and data which will be accessed frequently. Thus access delay caused by slow external memories can be reduced. V830 is also provided with an instruction cache and a data cache.

#### 3.3. Considerations for Implementation on V830

In stochastic gradient algorithms, the calculations required for one tap includes two multiply-add, two read and one write operations. For processors with a fast MAC, memory access speed is critical for the performance. This fact differentiates RISC processors from DSPs which have multiple memory banks and parallel data access capability. Since the external bus clock cycle of V830 is two or three times longer than the CPU core clock, access to external memories requires five or seven CPU clock cycles. Therefore, the tapped delay line and the filter coefficients should be stored in the internal data RAM. The capacity of the internal data RAM is sufficient for a mobile hands-free echo canceller with several hundred taps.

For the MAC, a special attention is required; the adder input for MAC should be updated three instruction cycles before an MAC instruction. Otherwise, an extra wait cycle will be inserted by a program sequencer[5]. Instruction scheduling and loop unrolling[6] reduces such wait cycles. For coefficient update, the wait cycle is omitted by placing a coefficient load instruction three cycles before a coefficient update. Loop unrolling reduces the number of wait cycles for convolution.

#### 3.4. Comparison with DSP

The implemented echo canceller is compared with that on a DSP. Table 1 compares the specifications, the computational costs, the power consumption for V830 and a 16-bit fixed-point DSP  $\mu$ PD77018A by NEC. The number of instruction per tap for V830 is 2.5 times larger than  $\mu$ PD77018A because parallel execution by DSP reduces the number of instructions, while the clock frequency of V830 is almost twice as fast as  $\mu$ PD77018A. Thus, the execution time of the echo canceller on V830 is almost the same as that on  $\mu$ PD77018A. The power consumption of V830 is lower than  $\mu$ PD77018A.



Fig. 4. Echo and noise power.

# 3.5. Integration into car navigation system

The implemented adaptive filter has been integrated into a car navigation system as middleware for a mobile hands-free echo canceller. Figure 3 exhibits a car navigation system. This system consists of a main unit and an optional speech recognition unit. The speech recognition unit can be used as an echo canceller. Thus, no additional hardware is required for the echo canceller.

#### 4. PERFORMANCE EVALUATION

Performance of the proposed algorithm as a mobile handsfree echo canceller has been evaluated by both computer simulations using real signals and measurements using a V830 evaluation board. The accuracy of the noise power estimation and its influence on the convergence speed have been evaluated by computer simulations. This is because measuring an internal parameter of the hardware is difficult.

# 4.1. Computer Simulations

Simulations have been carried out using an echo and some noise signals recorded in a car. A reference input signal was a female speech signal. Noise1 was an idling noise of a car with a noisy diesel engine. Noise2 had been recorded in a moving car and contains an engine noise, a wind noise and also brake noises. These signals have been sampled at 8 kHz and converted into a 16-bit integer format. Fig. 4 depicts the echo and the noise power.

To demonstrate the robustness of the proposed algorithm against the noise, two conventional noise-robust algorithms, NSG-GALS[2] and TVS-NLMS[3], have been compared with the proposed algorithm. The parameters were so optimized as to achieve a largest ERLE (Echo Return Loss Enhancement). The



Fig. 7. ERLE for Noise2.

parameters for the proposed algorithm were settled as N = 512,  $\mu_0 = 0.2$ ,  $P_0 = 10^5$ ,  $\alpha = 0.1$  and  $\beta = 0.9985$ . For NSG-GALS,  $\mu_0 = 0.1$ ,  $\rho = 1.0 \times 10^{-4}$ ,  $\alpha = 2.0 \times 10^{-5}$  and  $\beta = 0.0001$  were used.  $\rho = 0.01$  and  $\varepsilon = 1.0 \times 10^6$  were chosen for TVS-NLMS.

The estimated noise power is shown in Fig. 5. The proposed noise estimator results in correct estimates. In 5 seconds from the beginning of the simulation, the conventional algorithm failed to estimate the echo power. The estimated noise power was 100 times larger than the actual noise power and almost equal to the echo power.

Figure 6 shows the ERLE for Noise1. The proposed algorithm converges faster than the conventional algorithm. Thanks to the improved noise-power estimator, the ERLE of the proposed algorithm is almost 5dB larger than that of the conventional algorithm.

Figure 7 compares the ERLE for the proposed algorithm with those for NSG-GALS and TVS-NLMS. The proposed algorithm reduces echoes by almost 15dB even with a low echoto-noise ratio. The conventional algorithms fail to reduce echoes.



Fig. 8. V830 evaluation board.



Fig. 9. Measured signal power for double-talk case.

#### 4.2. Measurements

The performance of the proposed algorithm has been measured using a V830 evaluation board shown in Fig. 8. The signal powers before and after echo cancellation for a double-talk case are compared in Fig. 9. The implemented echo canceller successfully reduce echoes without double-talk detection.

Figure 10 depicts the measured power spectrum of the residual echoes in a noisy environment with Noise1. The ERLE is almost 15dB between 1000Hz and 2000Hz where the echo is dominant. A peak around 600Hz is mainly generated by the noise. The power spectrum of the residual echo is similar to that of the noise.

# CONCLUSION

A noise-robust, fast-convergence echo canceller and its implementation on V830 multimedia RISC have been presented. To avoid influence of immature filter coefficients, the noisepower estimator is controlled by the reference input power rather than the echo replica power. This algorithm has been implemented on V830 multimedia RISC as middleware for a mobile hands-free echo canceller. V830 provides performance comparable to a DSP and extended flexibility while power consumption is lower than that of a DSP. A low-cost, compact echo canceller has been realized by sharing its hardware with a speech recognition middleware for a car navigation system. Results of computer simulations and measurements show fast convergence and robustness against disturbance such as a noise and a doubletalk, even without double-talk detection.



Fig. 10. Power spectrum of residual echo and noise.

### ACKNOWLEDGMENTS

The authors are indebted to Akihiko Sugiyama, a principal researcher of C&C Media Research Laboratories, NEC Corporation, for helpful discussions on adaptation algorithms. They also wish to thank Kouhei Nadehara, a research engineer of C&C Media Research Laboratories, NEC Corporation, for his valuable comments on code optimization for V830. The authors wish to express their gratitude to Hideyuki Takahashi, the manager of Tokyo LSI center, NEC IC Microcomputer systems, Ltd., Kazuyoshi Kuwahara and Shinichi Inose, members of Semiconductor solution engineering division, NEC Corporation, for their guidance and continuous encouragement.

# REFERENCES

- A. Hirano et al., "A Noise-Robust Stochastic Gradient Algorithm With An Adaptive Step-Size Suitable for Mobile Hands-Free Telephones," Proc. of ICASSP95, pp. 1392-1395, 1995.
- [2] A. Sugiyama, "An Interference-Robust Stochastic Gradient Algorithm with a Gradient-Adaptive Step-Size," Proc. of ICASSP '93, vol. 3, pp. 539-542, 1993.
- [3] H. P. Meana et al., "A Time Varying Step Size Normalized LMS Echo Canceler Algorithm," Proc. of ICASSP '94, vol. 2, pp. 249-252, 1994.
- [4] J. Nagumo et al., "A Learning Method for System Identification," IEEE Trans. on Automatic Control, Vol. AC-12, No. 3, pp. 282-287, 1967.
- [5] K. Nadehara et al., "Low-Power Multimedia RISC," IEEE MICRO Magazine, Vol. 15, No. 6, pp. 20-29, Dec. 1995.
- [6] J. Hennessy et al., "Computer Architecture A Quantitative Approach – ," Morgan Kaufmann Publishers, Inc., San Francisco, CA, USA, 1996.
- [7] "μPD77018A, 77019 data sheet," Document No. U11849EJ2V0DS00, NEC Corporation, 1997