# LOW COMPLEXITY IMPLEMENTATION OF CARRIER AND SYMBOL TIMING SYNCHRONIZATION FOR A FULLY DIGITAL DOWNHOLE TELEMETRY SYSTEM

*Yifei Yang*<sup>†\*</sup>, *Joni Polili Lie*<sup>†</sup>, and Alberto Quintero<sup>†</sup>

<sup>†</sup>Wireline & Perforating, Halliburton Company

\* Department of Electrical and Computer Engineering, National University of Singapore, Singapore

# ABSTRACT

This paper presents a low complexity implementation of carrier phase and symbol timing synchronization on an OMAP-L137 DSP-ARM dual core processor for a M-ary phase-shift-keying (M-PSK) based fully digital downhole telemetry system. The synchronization uses a data-aided algorithm (DA) that exploits information from the pilot symbols (PS) in each transmission to retrieve the timing epoch and carrier phase. Its implementation, in conjunction with the rest of the modem functionality, uses only the optimized TMS320C67x DSP and Math libraries to shorten the development time and reduce implementation complexity. Simulation results confirm its convergence to the modified Cramer-Rao bound (MCRB). Moreover, real-time implementation on a mono-conductor telemetry system employing quadrature phase-shift keying (QPSK) modulation is performed to demonstrate its effectiveness in real-world environments.

*Index Terms*— synchronization, timing recovery, MPSK, carrier phase estimation, signal processing in digital receivers

## 1. INTRODUCTION

In a digital telemetry system, the process of generating and receiving communication signals traditionally implemented in analog hardware has been transitioned to digital signal processing, firmware, and software computations implemented in embedded systems. To maintain the exchange of information and valid data flow, such synchronous communication systems require the acquisition and tracking of carrier phase and symbol timing for the transmission.

To achieve synchronization, a traditional analog front-end system requires reliable phase-locked loop (PLL) analog circuitry in the timing recovery circuit and it might be subjected to timing jitter depending on its PLL circuitry implementation [1]. On the contrary, its digital implementation using a numerically-controlled oscillator (NCO) is free from any jitter due to its ability to generate a waveform with precise numerical frequency. Moreover, a fully digital implementation also replaces analog matched filtering (MF) with its digital variant by moving the analog-to-digital converter prior to MF [2].

With the transition from analog into digital-based telemetry, implementation of synchronization is no longer constrained by the reliability of analog circuitry. Moreover, synchronization can be improved by using *a priori* information in the form of PS transmitted during the training stage [2, 3]. Alternatively, when *a priori* information is unavailable, non-data-aided synchronization schemes (NDA) exploit information from the random data symbols [4]. Multiple NDA as well as DA schemes have been reported in the literature [5–10].

This paper discusses the implementation of the carrier phase and symbol timing synchronization for a fully digital telemetry system. The synchronization is based on the transmission of known *a priori* 

PS, sampled at multiples of the symbol rate. The mismatch in symbol timing is compensated by sample-shifting its discretized signal to its digital MF. Such implementation is common in systems that sample at a frequency much higher than its carrier.

The content of this paper is organized as follows. Section 2 explains the implemented digital receiver architecture that allows using dot product to realize the MF. Section 3 describes the low-complexity implementation of algorithms for carrier phase and symbol timing recovery. Section 4 presents simulation results of the normalized timing error variance and the real-time experimental results from a mono-cable telemetry system.

# 2. FULLY DIGITAL RECEIVER ARCHITECTURE

In a classical analog receiver, an analog MF is implemented at the receiver side and its output is sampled at the optimal instance by altering the phase of the voltage-controlled oscillator (VCO) by means of PLL. The analog MF is also commonly observed even in digital receivers [2, 6]. However, this structure relies heavily on the PLL circuitry to produce accurate MF output. Thus, this receiver architecture is not considered as fully digital. To achieve a fully digital received signal and this requires oversampling at a rate higher than the Nyquist rate.

Two possible approaches are reported in the literature for time recovery using non-synchronized sampling [11]. The first is to limit the sampling rate requirement to two samples per symbol and compensate for the MF accuracy loss by interpolating the sampled signal to a much higher rate to obtain the optimal timing instance. The MF output is then downsampled to the symbol rate and only the optimal data point after the interpolation is selected as the estimated symbol timing. Even though a full interpolation filter is not required, the performance of the receiver employing such approach can vary, depending on the implementation of the interpolation filter. The second approach avoids the interpolation filter by oversampling at a much higher rate, and then decimating the MF output to the symbol rate. The timing recovery is achieved through the choice of the starting index of the input to the digital MF.

The fully digital receiver architecture considered in this paper is based on the second approach as it is more practical for downhole



Fig. 1. Generic system model of a fully digital receiver



Fig. 2. Matched filter implemented with polyphase decimation filter

telemetry applications. The practical reasons are twofold. Firstly, the use of a higher sampling rate allows the telemetry system to implement the down-conversion digitally coupled with the MF. Thus it avoids the additional requirement of circuit-based implementation of the down-conversion, as well as satisfies the Nyquist rate sampling requirement when the carrier frequency selected is much higher than the symbol rate. Secondly, given the uncertainty of the usable channel spectrum resulting from the presence of noise and interference in the transmission line, the high sampling rate approach provides the flexibility to shift the carrier frequency on-the-fly. Fig. 1 shows the generic system model of the fully digital receiver where r(t) denotes the analog received signal to be sampled at frequency  $F_s$ . The discrete received signal r[n] is fed to the digital MF with the delay  $\hat{\tau} = \hat{q}_{\mathrm{sym}} T_s$  as the timing symbol estimate, where  $T_s$  is the sampling period and  $\hat{q}_{\rm sym}$  is the estimated sample index within one symbol (i.e.,  $0 \leq \hat{q}_{sym} \leq N_b - 1$ , where  $N_b$  is the number of samples per symbol). At the output of the MF, the signal is further decimated to symbol rate before applying phase rotation based on the estimated phase  $\hat{\theta}$ .

The implementation of the digital MF coupled with the decimation shown in Fig. 1 can be further simplified using the polyphase decimation filter structure shown in Fig. 2. This structure allows reducing the effective computational complexity to the same level of complexity as symbol rate sampling. The polyphase decimator is none other than the dot product between the time-shifted received signal and the look-up-table (LUT) stored vector. As dot product operation is one of the common digital signal processing (DSP) kernel, its optimized code is available in the TMS320C67x DSP Library (DSPF\_sp\_dotprod).

Conversely, similar simplification can be extended to the transmitter architecture such that the *M*-PSK modulation with pulse shaping can be realized as symbol-rate convolution. Its convolution buffer is then updated by the sum of vectors operation (DSPF\_sp\_w\_vec) between the buffer and LUT stored transmit waveform.

#### 3. SYNCHRONIZATION ALGORITHMS

### 3.1. System Overview

Consider the baseband transmitted signal of a single-carrier transmission given by:

$$s(n) = \sum_{i=-\infty}^{\infty} c_i g(n - N_b i), \qquad (1)$$

where  $c_i$  is the complex symbol modulated by the *M*-PSK scheme upsampled to  $N_b$  samples per symbol and g(n) is a real-valued unity-energy signaling pulse-shaping function with  $N_b$  samples per symbol implemented digitally. The index *i* and *n* are the symbol index and sample index, respectively. The signal is then up-converted to the carrier frequency and transmitted through an additive-white Gaussian noise (AWGN) channel. If implemented using a fully-digital transmitter architecture, the transmitted signal at carrier frequency  $f_c$  can be expressed similar to (1) with the complex parameter  $c_i$  replaced with the real-valued  $c_i(f_c) = \Re\{c_i\} \cos(2\pi f_c/f_s n) - \Im\{c_i\} \sin(2\pi f_c/f_s n)$ , where  $f_s$ is the sampling rate of the digital-to-analog converter. The operators  $\Re\{\cdot\}$  and  $\Im\{\cdot\}$  denote the real and imaginary components of the complex value. At the receiver side, the received signal is sampled directly without down-converting to baseband:

$$r(n) = \exp(j\theta) \sum_{i=-\infty}^{\infty} c_i(f_c)g\left(n - N_b i - \left\lfloor\frac{\tau}{T_s}\right\rfloor\right) + \nu(n), \quad (2)$$

where  $\theta$  and  $\tau$  are the unknown carrier phase and symbol timing, respectively.  $\nu(n)$  is modelled as AWGN. When synchronized, the effects of  $\theta$  and  $\tau$  are compensated with phase rotation and sample shift, respectively. Next, the decision-feedback based Costas loop used for solving and tracking  $\theta$  is described, assuming that the symbol timing  $\tau$  is no longer unknown but replaced with the nearest sample index estimate,  $\hat{\tau} = \hat{q}_{\text{sym}} T_s$ . It is important to note that the time interval required for the phase estimation is assumed to be sufficiently short so as not to violate the cyclo-stationary assumption that guarantees  $\theta$  to remain constant. Subsequently described is the DA symbol timing recovery that solves for the nearest sample index  $q \equiv n \pmod{N_b}$  to align the polyphase decimator to the symbol duration.

## 3.2. Decision Feedback Based Costas Loop

The Costas loop originates from the PLL circuit, which aims to track the carrier phase of the received signal. In the proposed algorithm, we adopted the decision-feedback based digital *M*-PSK Costas loop. The implementation of the Costas loop phase tracking using the fully digital receiver is based on the phase output measurement from the polyphase decimator, which is calculated as

$$\phi(k) = \arctan\left(\frac{\tilde{v}_Q(k)}{\tilde{v}_I(k)}\right),\tag{3}$$

where k is the iteration index and  $[\tilde{v}_I(k), \tilde{v}_Q(k)]$  are the in-phase and quadrature components of the polyphase decimator. They are calculated from the sampled and time-shifted received signal  $r(n - \hat{q}_{sym})$ .

The Costas loop iteration is described in the following:

- 1. Initialize the phase estimate  $\hat{\theta}(0)$  and the step-size  $\mu$ .
- 2. Calculate  $\phi(k)$  using (3).
- 3. Update the phase estimate  $\hat{\theta}(k)$ :

$$\hat{\theta}(k) = \hat{\theta}(k-1) + \mu \varepsilon(k), \tag{4}$$

$$\varepsilon(k) = \phi(k) - \varphi(\kappa), \tag{5}$$

$$\kappa = \arg\min \|\phi(k) - \varphi(m)\|, \tag{6}$$

where  $\kappa$  is the *M*-PSK symbol index with the phase that is nearest to the measured phase.  $\varphi(m)$  denotes the *m*-th symbol phase (e.g. the QPSK symbol phases are  $\{-\frac{3}{4}\pi, -\frac{\pi}{4}, \frac{\pi}{4}, \frac{3}{4}\pi\}$ ). 4. Terminate the iteration if converged. Otherwise, increment k and repeat the iteration from Step 2.

The Costas loop iteration can be initialized with  $\hat{q} = 0$  and the iteration is expected to converge as the symbol timing recovery algorithm updates its estimate. Upon convergence, the iteration will then be terminated and resumed when a phase update is required due to the clock drift or the mismatch in carrier frequency. To account for the quantization error in the symbol timing, the carrier phase at the final iteration can include the phase error component  $\varepsilon(k)$ .

## 3.3. Data-Aided Symbol Timing Recovery

The goal of the symbol timing recovery is to search for the optimal timing offset within the symbol period for demodulation purpose. The optimal symbol sampling time ensures that the highest signal-to-noise ratio (SNR) and minimum inter-symbol interference (ISI) are achieved. In the context of a wireline telemetry system, the MF and down-conversion are performed on the discretized received signal. Hence, the symbol timing estimation is implemented digitally by searching for the optimal sampling index from all possible shifts. The residue timing error can be complimented by the phase offset calculated from the Costas loop phase tracking.

The quantized symbol timing estimate can be computed based on M PS:

$$\hat{q}_{\text{sym}} = \operatorname*{arg\,max}_{0 \le q \le N_b - 1} \boldsymbol{v}^H(q) \boldsymbol{v}(q),\tag{7}$$

where  $v(q) = [\tilde{v}_I(n-q)+j\tilde{v}_Q(n-q), \ldots, \tilde{v}_I(n-q-(M-1)N_b)+ j\tilde{v}_Q(n-q-(M-1)N_b)]^T \in \mathbb{C}^{M\times 1}$ . The implementation of (7) for symbol timing recovery comprises evaluating its cost function associated with  $N_b$  possible time shifts and searching for a maximum cost function, where each cost function requires implementation of complex matrix multiplication. Based on the concept that the multiplication is performed between complex conjugate pairs and results in a scalar, it can be implemented as the sum of squares of 2M elements of an array. This operation is also one of the DSP kernels (DSPF\_sp\_vecsum\_sq).

Upon obtaining  $N_b$  cost values, the symbol timing estimate can be deduced from the index that provides the maximum among these values. This operation is implemented using the function DSPF\_sp\_maxidx. To improve its robustness against measurement noise at the polyphase decimator output, the objective function of (7),  $J(q) \triangleq \boldsymbol{v}^H(q)\boldsymbol{v}(q)$ , can be modified to capture the variation in both in-phase and quadrature components as:

$$J_{\text{var}}(q) = \boldsymbol{v}^{H}(q)\boldsymbol{v}(q) - \alpha \text{Var}\left[\left\{v_{m}(q)\right\}\right], \qquad (8)$$

where  $\alpha$  is a weight factor to the sample variance  $\operatorname{Var}(\cdot)$  of the estimated output  $v_m(q) = \tilde{v}_I(n-q-mN_b) + j\tilde{v}_Q(n-q-mN_b)$ ,  $m = 0, \ldots, M-1$ .

It is important to note that evaluating  $N_b$  possible sample shifts translates to an  $N_b$  proportional increase in computational requirements which might violate the real-time constraints. This can be avoided by extending the period of synchronization such that the transmitter includes additional synchronization aiding symbols.

#### 4. SIMULATION AND EXPERIMENT RESULTS

This section presents simulation and experimental results for algorithm described in Section 3. The system parameters chosen are: (1) the discrete pulse shaping function of MF g(n) is a root-raised-cosine function with rolloff = 0.1; (2) the number of PS used for synchronization is 100; and (3) the modulation scheme employed is



Fig. 3. Normalized timing variance with  $N_b$ =16 and  $N_b$ =32 under different objective functions

QPSK. Serial transmission of data in an AWGN channel is considered in the simulation and that in a wireline telemetry is used in the experiment.

Simulation results using 16 and 32 samples per symbol  $(N_b)$  with three different objective functions are shown in Fig. 3. The modified Cramer-Rao bound (MCRB) is also indicated as [12]

$$MCRB(\tau) = \frac{T^2}{8\pi^2 \xi L_0} \frac{1}{E_s/N_0},$$
(9)

where

$$\xi = \frac{\int_{-\infty}^{\infty} T^2 f^2 |G(f)|^2 df}{\int_{-\infty}^{\infty} |G(f)|^2 df},$$
(10)

where G(f) is the Fourier transform of g(t),  $L_0T$  is the observation interval and T is the symbol period. MCRB offers a theoretical lower bound of the error variance that can be achieved by any practical synchronizer. The timing epoch estimator is unbiased, as the bias is negligible compared with the error standard deviation. The roll-off factor  $\beta$  for g(t) is 0.1 and  $L_0$  is 100. Plots of some maximum likelihood- (ML-) based timing estimation algorithm are also shown [5]. As shown in Fig. 3, both  $N_b=16$  and  $N_b=32$  are superior to the ML synchronizer employing square-law nonlinearity (SLN) [10]. The  $N_b=16$  cases also outperform the absolute nonlinearities (AVN) and logarithmic nonlinearity (LOGN) cases, and are very close to the MCRB. Three variants of objective function from (8) are used in the simulation, with  $\alpha = 0, 1, 2$ . It is also observed that generally  $\alpha = 2$  has the best performance in this simulation setting.

Next, the experimental setup used to verify the proposed joint timing and phase estimation is discussed, which is typically deployed in downhole telemetry for oil well services applications. The setup includes the configuration of the power line communication system with full duplex capability. The transmission medium used in the system is a 7/32-inch mono-conductor cable with voltage rating up to 1.2k Volts. The length of the mono-conductor cable can be configured to 7,000, 13,000, 28,000 or 40,000 ft. One end of the mono-conductor cable contains the uphole surface system comprising the high-voltage power supply with a line filter for delivering high voltage to the downhole equipment, as well as the surface



Fig. 4. Block diagram of downhole telemetry system setup

telemetry board for transmitting the downlink signal and receiving the uplink signal. Likewise, on the other end of the mono-conductor cable, the downhole telemetry board is used for receiving the downlink signal and transmitting the uplink signal with the line filter providing interference suppression from the high-voltage downhole equipment. In addition, on both sides of the system, the telemetry transformer provides the interface between the high-voltage and low-voltage telemetry systems. As the carrier frequency for the downlink signal resides in the lower frequency while the uplink is in the higher frequency, the system includes a transmitter low pass and high pass filter for downlink and uplink transmission, respectively. Fig. 4 shows the block diagram of the downhole telemetry system setup.

The test conducted using the setup described previously is performed as follows. A discrete transmitted signal is generated based on QPSK modulation of 24 PS followed with a randomly-generated bit stream that accumulates to approximately 10,000 symbols. The sampling rate is set at 100kHz and the carrier signal chosen for the downlink is 8kHz. The signal is transmitted through the monoconductor cable of length 7,000 ft. On the other end of the cable, the received signal is also sampled at 100kHz and the sampled signal is processed using the proposed algorithm for estimating the timing epoch and carrier phase offset. Then, the estimated timing and phase are used in the demodulation of the data following the PS. The constellation diagrams of the demodulated data without synchronization and with the proposed synchronization and phase correction are compared as shown in Fig. 5.

The synchronization algorithm successfully recovers the phase offset and the timing epoch, as is evident from the corrected constellation diagram. The same test is repeated for the uplink signal with the carrier frequency set at 18 kHz, and the test results show a similar observation on the constellation diagram.



#### 5. CONCLUSION

A practical synchronization algorithm has been implemented in a real-time downhole telemetry system. The implementation of *M*-PSK modem functionality together with its synchronization uses only the optimized library available as TMS320C67x DSP and Math libraries. Such an approach allows significantly shorter modem development time. Its effectiveness is demonstrated using simulations and experiments on an actual environment.

#### 6. ACKNOWLEDGEMENT

The authors are grateful for the assistance of Jimmy Zhang, Colin Eu, and Roy Tan from Halliburton for making possible the prototype telemetry system described in this paper and would also like to thank Halliburton management as well as Singapore Economic Development Board for permission to publish this paper.

# 7. REFERENCES

- [1] X. Gao, E. A. M. Klumperink, P. F. J. Geraedts, and B. Nauta, "Jitter analysis and a benchmarking figure-of-merit for phaselocked loops," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 56, no. 2, pp. 117–121, 2009.
- [2] U.Mengali and A.N.D'Andrea, Synchronization techniques for digital receivers, Springer US, Boston, MA, 1997.
- [3] J. W. M. Bergmans and HoWai Wong-Lam, "A class of dataaided timing-recovery schemes," *IEEE Transactions on Communications*, vol. 33, no. 2/3/4, pp. 1819–1827, 1995.
- [4] C. Herzet, N. Noels, V. Lottici, H. Wymeersch, M. Luise, M. Moeneclaey, and L. Vandendorpe, "Code-aided turbo synchronization," in *Proc. of the IEEE*, Jun. 2007, vol. 95, pp. 1255–1271.
- [5] M. Morelli, A. N. D'Andrea, and U. Mengali, "Feedforward ml-based timing estimation with psk signals," *IEEE Communications Letters*, vol. 1, no. 3, pp. 80–82, 1997.
- [6] R. Hamila, J. Vesma, and M. Renfors, "Polynomial-based maximum-likelihood technique for synchronization in digital receivers," *IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing*, vol. 49, no. 8, pp. 567–576, 2002.
- [7] F. M. Gardner, Demodulator Reference Recovery Techniques Suited for Digital Implementation, European Space Agency, Final Report, ESTEC Contract No. 6847/86/NL/DG, Aug. 1988.
- [8] F. M. Gardner, "A BPSK/QPSK timing-error detector for sampled receivers," *IEEE Transactions on Communications*, vol. 34, no. 5, pp. 423–429, 1986.
- [9] K. Mueller and M. Muller, "Timing recovery in digital synchronous data receivers," *IEEE Transactions on Communications*, vol. 24, no. 5, pp. 516–531, 1976.
- [10] M. Oerder and H. Meyr, "Digital filter and square timing recovery," *IEEE Transactions on Communications*, vol. 36, no. 5, pp. 605–612, 1988.
- [11] J. H. Reed, Software radio: a modern approach to radio engineering, Prentice Hall Professional, NJ, 2002.
- [12] A. N. D'Andrea, U. Mengali, and R. Reggiannini, "The modified cramer-rao bound and its application to synchronization problems," *IEEE Transactions on Communications*, vol. 42, no. 234, pp. 1391–1399, Feb. 1994.