Chair: Thomas E. Tremain, U.S. Department of Defense (USA)
B. Mouy, Thomson CSF-RGS (FRANCE)
P. de La Noue, Thomson CSF-RGS (FRANCE)
G. Goudezeune, Thomson CSF-RGS (FRANCE)
This paper presents a new voice coder for applications in very low bit rate communication systems normalized by NATO under STANAG agreement 4479. The originality of this standardization is the description of both the source and the channel coding. It is the natural continuation of the well known LPC10e 2400 bps voice coder normalized under STANAG 4198. The analysis and synthesis are the same as in the LPC10e vocoder but the quantization process is specific. The main points of innovation are presented. An associated error correcting scheme increases the source bit rate from 800 up to 2400 bps. It has been optimized in the framework of HF-ECCM system (HF-Electronic Counter Counter Measure), to take into account all possible channel perturbations as well as system constraints especially in terms of minimization of the delay and Turn Around Time.
Masayuki Nishiguchi, Sony Corporation (JAPAN)
Jun Matsumoto, Sony Corporation (JAPAN)
An efficient coding scheme for Linear Predictive Coding (LPC) residuals is proposed based on harmonic and noise representation. New features of the scheme include classified vector quantization of the spectral envelope of LPC residuals with a weighted distortion measure. The improvement in performance obtained by classifying codebooks based on a voiced/unvoiced (V/UV) decision is shown. Sequences of the short-term rms power of time domain waveforms are also vector quantized and transmitted for unvoiced signals. A fast synthesis algorithm for voiced signals using an FFT is also presented, which reduces the high complexity of the direct sinusoidal synthesis method with interpolated magnitudes and phases. Informal listening tests indicate that, in combination with a known LSP quantization technique, this residual coding scheme provides good communication quality at a total bit rate of less than 2.0Kbps.
M.A. Kohler, U.S. Department of Defense (USA)
L.M. Supplee, U.S. Department of Defense (USA)
T.E. Tremain, U.S. Department of Defense (USA)
In order to support the need for higher quality low rate voice communications for government, industry, and military customers, the United States Government is conducting a search for a new voice compression algorithm at 2400 bits per second (bps). The United States Department of Defense Digital Voice Processing Consortium (DDVPC), consisting of members from civilian and military branches of the U.S. government, is directing the testing and evaluation of several candidate 2400 bps algorithms. The goal of the DDVPC is to select a new algorithm which meets or exceeds the published requirements by mid 1996. The selected algorithm, to become the new standard, should be implementable in a small, low powered device by 1997. This paper describes the status of the testing and evaluation process from its beginning in early 1993 through the end of 1994.
Amitava Das, University of California - Santa Barbara (USA)
Allen Gersho, University of California - Santa Barbara (USA)
The low bit rate enhanced multiband excitation or EMBE speech coder adds several important new features including phonetic classification and a novel spectral quantization technique called variable dimension vector quantization (VDVQ) to the basic multiband excitation vocoder. Phonetic classification allows the adaptation of spectral modeling and quantization to the local acoustic-phonetic character of the speech signal, enhancing quality and robustness. The VDVQ scheme quantizes the log-spectrum with relatively few bits while preserving perceptually important features. Both the fixed rate (2.4 kb/s) and the variable rate (1.44 kb/s average) implementations of EMBE deliver speech quality compara- ble to the 4.8 kb/s Federal Standard 1016 CELP coder and the 4.15 kb/s Inmarsat-M standard IMBE coder.
V. Cuperman, Simon Fraser University (CANADA)
P. Lupini, Simon Fraser University (CANADA)
B. Bhattacharya, Simon Fraser University (CANADA)
In this paper we present Spectral Excitation Coding (SEC), a speech codec based on a sinusoidal model applied to the excitation signal. A phase dispersion algorithm allows the same model to be used for voiced as well as unvoiced and transitional sounds. The phase dispersion algorithm significantly improves the perceived quality resulting in more natural reconstructed speech. A new technique for variable-dimension vector quantization called Non-Square Transform Vector Quantization (NSTVQ) is used for quantization of the harmonic magnitudes. The SEC system at 2.45 kb/s achieved an MOS score 0.8 points higher than the 2.4 kb/s LPC-10 standard. A preliminary 1.85 kb/s SEC system which uses zero-bit magnitude quantization is also presented. Informal listening tests indicate that the quality of the 1.85 kb/s system exceeds that of the LPC-10 standard.
P. A. Laurent, Thomson CSF-RGS (FRANCE)
P. de La Noue, Thomson CSF-RGS (FRANCE)
This paper presents a new voice coder for applications in future low bit rate communication systems. The emphasis has been put on speech quality, noise robustness and complexity. The coder realizes a multiband+LPC spectral analysis and synthesis of speech. The transmitted information consists in a LPC10 filter, a set of voicing rates, a pitch, energies, spectral density of excitation in five subbands, and information about stationarity of the signal in each half-frame. Depending upon this stationarity, the quantization process is adapted to provide more spectral information (stable speech) or more temporal information (transitory speech). In order to be less sensitive to surrounding noise, pitch and voicing rates are first computed in each subband. The final values of these parameters are obtained from the values in the current frame and its neighbors. The excitation signal used at the synthesis side consists in a mixture of isolated pulses, periodic and aperiodic signals of adjustable spectral composition. Test results are provided.
Gao Yang, Lernout & Hauspie Speech Products
G. Zanellato, Lernout & Hauspie Speech Products
H. Leich, Faculte Polytechnique de Mons (BELGIUM)
For speech coding at a bit rate below 4 kbps, the attention has been concentrated on sinusoidal-based vocoders during the past decade. Several models such as the MBE [8] have been proposed to synthesize high quality speech while removing the buzzy quality often produced because of over strong periodicity. This paper proposes a new model for voiced speech coding at very low bit rates, referred to as Band-Widened Harmonic coding (BWH). This model was demonstrated to be able to win some advantages over existing ones in both quality and complexity. A comparison between the BWH and the MBE will be given in this paper.
W. Bastiaan Kleijn, AT&T Bell Laboratories (USA)
Jesper Haagen, AT&T Bell Laboratories (USA)
For low-rate speech coding it is advantageous to represent the speech signal as an evolving characteristic waveform (CW). The CW evolves slowly when the speech signal is clearly voiced and rapidly when the speech signal is clearly unvoiced. The voiced (periodic) and unvoiced (nonperiodic) components of the speech signal can be separated by a simple nonadaptive filter in the CW domain. Because of perceptual effects, a significant increase in coding efficiency is obtained by coding these two components separately. A 2.4 kb/s coder using these principles was developed. In an independent evaluation, the performance of the 2.4 kb/s WI coder was found to be at least equivalent to the 4.8 kb/s FS1016 standard for all of the many tests.
R. Taori, Philips Research Laboratories (THE NETHERLANDS)
R.J. Sluijter, Philips Research Laboratories (THE NETHERLANDS)
E. Kathmann, Philips Research Laboratories (THE NETHERLANDS)
This paper presents a new time-domain algorithm for compressing speech signals. Using a novel tool which we will refer to as Time Weighted Average (TWA), a periodically extendable pitch cycle is extracted from the voiced regions in the speech signal. This procedure is carried out every x^th pitch period. The discarded x - 1 pitch periods are recovered using pitch synchronous interpolation (PSI). The computational complexity of the resulting decoder is surprisingly modest and shows reasonable potential of implementation on hardware as primitive as the Intel 8088 (mu)-processor. Simulation results show that the reconstruction quality is comparable to G.721.
Haiyun Yang, Nanyang Technological University (SINGAPORE)
Soo-Ngee Koh, Nanyang Technological University (SINGAPORE)
Pratab Sivaprakasapillai, Nanyang Technological University (SINGAPORE)
A novel speech coding algorithm, named pitch synchronous multi-band (PSMB), is proposed. It uses the multiband excitation (MBE) model to generate a representative pitch-cycle waveform (PCW) for each frame. The representative PCW of a frame is encoded by two out of three codebooks depending upon whether the frame is related or unrelated to the previous frame. The new speech coder introduces a pitch-period- based coding feature. The PSMB coder operating at 4 kbps outperforms the Inmarsat 4.15 kbps IMBE coder by a clear margin. It is also found to be slightly better than the FS1016 4.8 kbps code excited linear predictive (CELP) coder in terms of perceptual quality. Fast search algorithms for the three codebooks used in PSMB are also developed.