SP-1.1

Analysis by Synthesis Speech Coding with Generalized Pitch Prediction
Paul Mermelstein, Yasheng Qian (INRS-Telecommunications, Universite du Quebec)

A new analysis-by-synthesis speech coding structure is presented for high-quality speech coding in the 4 to 8 kb/s range. CELP with generalized pitch prediction (GPP-CELP) differs from classical code-excited linear prediction (CELP) in that for voiced segments it is the speech signal that is decomposed into a component predictable with the aid of the adaptive codebook (ACB) and a nonpredictable aperiodic component, not the LPC residual. The spectrum of the aperiodic component is estimated by linear-prediction analysis. An approximation to the aperiodic component is synthesized from a stochastic codebook of sparse pulse sequences and its spectrum is shaped by the LPC synthesis filter. The ACB contains samples of the past reconstructed signal, low-passed to increase the pitch prediction gain. For voiced segments the new structure yields higher pitch prediction gain and lower linear-prediction gain than classical CELP. Subjective and objective comparisons reveal significant advantages for GPP-CELP over classical CELP.

SP-1.2

A 16, 24, 32 kbit/s Wideband Speech Codec Based on ATCELP
Pierre Combescure (France Telecom CNET, DIH/DIPS, France), Juergen Schnitzler (Aachen University of Technology, IND, Germany), Kyrill Fischer, Ralf Kirchherr (Deutsche Telekom Berkom, Germany), Claude Lamblin, Alain Le Guyader, Dominique Massaloux, Catherine Quinquis (France Telecom CNET, DIH/DIPS, France), Joachim Stegmann (Deutsche Telekom Berkom, Germany), Peter Vary (Aachen University of Technology, IND, Germany)

This paper describes a combined Adaptive Transform Codec (ATC) and Code-Excited Linear Prediction (CELP) algorithm, called ATCELP, for the compression of wideband (7 kHz) signals. The CELP algorithm applies mainly to speech, whereas the ATC mode is selected for music and noise signals. We propose a switching scheme between CELP and ATC mode and describe a frame erasure concealment technique. Subjective listening tests have shown that the ATCELP codec at bit rates of 16, 24 and 32 kbit/s achieved performances close to those of the CCITT G.722 at 48, 56 and 64 kbit/s, respectively, at most operating conditions.

SP-1.3

A 6.1 to 13.3-kb/s Variable Rate CELP Codec (VR-CELP) for AMR Speech Coding
Stefan Heinen, Marc Adrat (Institute of Communication Systems and Data Processing, Aachen University of Technology, Germany), Oliver Steil (), Peter Vary (Institute of Communication Systems and Data Processing, Aachen University of Technology, Germany), Wen Xu (Department of Mobile Phone Development, Siemens AG, Hofmannstr. 51, 81359 Munich, Germany)

We propose a new 6.1 to 13.3-kb/s speech codec called variable rate code-excited linear prediction (VR-CELP) for Adaptive Multi-Rate (AMR) transmission over mobile radio channels such as GSM or UMTS. The AMR concept allows to operate with almost wireline speech quality for poor channel conditions and better quality for good channel conditions. This is achieved by dynamically splitting the gross bit rate of the transmission system between source and channel coding according to the current channel conditions. Thus the source coding scheme must be designed for seamless switching between rates without annoying artifacts. To enhance the transmission quality under very poor channel conditions, a new powerful error concealment strategy based on estimation theory is applied.

SP-1.4

CELP Speech Coding Based on an Adaptive Pulse Position Codebook
Tadashi Amada, Kimio Miseki, Masami Akamine (Toshiba Corporation)

CELP coders using pulse codebooks for excitations such as ACELP have the advantages of low complexity and high speech quality. At low bit rates, however, the decrease of pulse position candidates and the number of pulses degrades reconstructed speech quality. This paper describes a method for adaptive allocating of pulse position candidates. In the proposed method, N efficient candidates of pulse positions are selected out of all possible positions in a subframe. Amplitude envelope of an adaptive code vector is used for selecting N efficient candidates. The larger the amplitude is, the more pulse positions are assigned. Using an adaptive code vector for the adaptation, the proposed method requires no additional bits for the adaptaion. Experimental results show that the proposed method increases WSNRseg by 0.3dB and MOS by 0.15.

SP-1.5

A Multistage Search of Algebraic CELP Codebooks
Miguel A Ram�rez, Max Gerken (Electronics Eng. Dept. - Escola Polit�cnica, University of S�o Paulo)

A joint amplitude and position search procedure is proposed for searching algebraic multipulse codebooks. It is implemented within the reference G.723.1 codec as an example. This joint search method is shown to reduce down to one third the number of comparisons per subframe relative to the focused search over an extensive speech database. An efficient implementation of the joint search is derived which incorporates backward filtering of the residual target vector and precomputation of autocorrelation elements, bringing about a reduction in complexity of one third in comparison to the focused search. The joint search performs about one thirtieth as many comparisons as the full position search.

SP-1.6

A Fast Search Method Of Algebraic Codebook By Reordering Search Sequence
NAM KYU HA (SK Teletech Co. Ltd.,)

This paper proposes a fast search method of algebraic codebook in CELP coders. In the proposed method, the sequence of codebook search is reordered according to the criterion of mean-squared weighted error between target vector and filtered adaptive codebook vector, and the algebraic codebook is searched until a predefined threshold is satisfied. This method reduces the computations considerably compared with G.729 at the expense of a slight degradation of speech quality. Moreover, it gives better speech quality with smaller average search space than G.729A.

SP-1.7

An 8 kbit/s ACELP Coder with Improved Background Noise Performance
Roar Hagen, Erik Ekudden (Ericsson Radio Systems AB)

This paper describes an 8 kbit/s ACELP speech coder with high performance for both speech and non-speech signals such as background noise. While the traditional waveform matching LPAS structure employed in many existing speech coders provides high quality for speech signals, it has significant performance limitations for e.g. background noise. The coder presented here employs a novel adaptive gain coding technique using energy matching in combination with a traditional waveform matching criterion providing high quality for both speech and background noise. The coder has a basic structure similar to that of the 7.4 kbit/s D-AMPS EFR coder, with a 10th order LPC, high resolution adaptive codebook and a 4-pulse algebraic codebook. The performance for speech signals is equivalent to or better than that of state-of the-art 8 kbit/s coders, while for background noise conditions the performance is significantly improved.

SP-1.8

On Phase Perception in Speech
Harald Pobloth, W. Bastiaan Kleijn (Royal Institute of Technology, Stockholm)

In this paper we define perceptual phase capacity as the size of a codebook of phase spectra necessary to represent all possible phase spectra in a perceptually accurate manner. We determine the perceptual phase capacity for voiced speech. To this purpose, we use an auditory model which indicates if phase spectrum changes are audible or not. The correct performance of the model was adjusted and verified by listening tests. The perceptual phase capacity in low pitched speech is found to be much higher than it is for high pitched speech. Our results are consistent with the well known fact that speech coding schemes which preserve the phase accurately work better for male voices, while coders which put more weight on the amplitude spectrum of the speech signal result in better quality for female speech.

SP-2 >

Last Update: February 4, 1999 Ingo Höntsch