1:00, SPEECH-P1.1
A 1200 BPS SPEECH CODER WITH LSF MATRIX QUANTIZATION
S. OZAYDIN, B. BAYKAL
A new 1200 bps speech coder designed with a tree searched multistage matrix quantization scheme is proposed. To improve speech quality and reduce the average bit rate, we have developed a new residual multistage matrix quantization method with the joint design technique. The new joint design algorithm reduces the codebook training complexity. Other new techniques for improving the performance include joint quantization of pitch and voiced/unvoiced/mixed decisions and gain interpolation. For the new matrix based speech coder (MBC), the listening tests have proven that an efficient and high quality coding has been achieved at bit rate 1200 bps. Test results are compared with the 2400 bps LPC10e coder and the new 2400 bps MELP coder which has been chosen as the new 2400 bps Federal Standard.
1:00, SPEECH-P1.2
A CANDIDATE FOR THE ITU-T 4 KBIT/S SPEECH CODING STANDARD
J. THYSSEN, Y. GAO, A. BENYASSINE, E. SHLOMOT, C. MURGIA, H. SU, K. MANO, Y. HIWASAKI, H. EHARA, K. TASUNAGA, C. LAMBLIN, B. KOVESI, J. STEGMANN, H. KANG
This paper presents the 4 kbit/s speech coding candidate submitted by AT&T, Conexant, Deutsche Telekom, France Telecom, Matsushita, and NTT for the ITU-T 4 kbit/s selection phase. The algorithm was developed jointly based on the qualification version of Conexant. This paper focuses on the development carried out during the collaboration in order to narrow the gap to the requirements in an attempt to provide toll quality at 4 kbit/s. This objective is currently being verified in independent subjective tests coordinated by ITU-T and carried out in multiple languages. Subjective tests carried out during the development indicate that the collaboration work has been successful in improving the quality, and that meeting a majority of the requirements in the extensive selection phase test is a realistic goal.
1:00, SPEECH-P1.3
A HYBRID CODER BASED ON A NEW PHASE MODEL FOR SYNCHRONIZATION BETWEEN HARMONIC AND WAVEFORM CODED SEGMENTS
N. KATUGAMPALA, A. KONDOZ
This paper presents a hybrid coder with a new phase model to synchronize harmonic and waveform coded segments, with a target bit rate of 4 kbps. The coder also employs a new technique based on analysis by synthesis to distinguish between stationary and transitional segments. Harmonic excitation is synchronized with the LPC residual by transmitting the location of the pitch pulse closest to the frame boundary and a phase value that represents the shape
of the corresponding pitch pulse. The performance of this phase model and the classification technique is evaluated using a hybrid coder. The coder has three modes: scaled white noise excitation colored by LPC for unvoiced, ACELP for transitions, and harmonic excitation for stationary segments. Subjective listening tests show that the coder produces good quality speech and the switching between the modes is transparent.
1:00, SPEECH-P1.4
EX-CELP: A SPEECH CODING PARADIGM
Y. GAO, A. BENYASSINE, J. THYSSEN, H. SU, E. SHLOMOT
This paper presents the core technology of novel enhancements to traditional CELP coding, coined eXtended CELP (eX-CELP). It is centered around a combined and selective usage of closed-loop/open-loop approach, and variant algorithm structure concept. The above two concepts are complemented by new features and refined existing technologies. The eX-CELP paradigm was used in several speech coding systems. It is the core technology of the recently chosen candidate for the 3G-CDMA speech codec standard. It was the best candidate for ITU-T 4kbps codec qualification test, and became the basis technology for a consortium candidate to the ITU-T 4kbps speech coding competition.
1:00, SPEECH-P1.5
MAXIMUM-TAKE-PRECEDENCE ACELP: A LOW COMPLEXITY SEARCH METHOD
F. CHEN, J. YANG
The ACELP method makes use of multipulse structure to represent the excitation pulses of residual signal. With the purpose of computational complexity reduction, this paper provides the Maximum-Take-Precedence ACELP (MTP-ACELP) search method under the acceptable degradation in performance. Because the maximum of target signal is preferentially compensated, the degradation of performance would be diminished. By predicting the locations of pulses, the computational complexity would be reduced. We not only reduce the possible pulse combinations in search procedure but also avoid the computation of useless correlation functions before the search procedure. Furthermore, the proposed method is compatible to any ACELP type vocoder, e.g. the G.723.1, G.729, GSM-EFR standards.
1:00, SPEECH-P1.6
SPECTRAL MAGNITUDE QUANTIZATION BASED ON LINEAR TRANSFORMS FOR 4KB/S SPEECH CODING
C. ETEMOGLU, V. CUPERMAN
This paper presents a matching pursuits sinusoidal speech coder which incorporates new techniques including a novel vector quantization (VQ) technique used for the weighted quantization of spectral magnitude vector, and an inter-frame quantization of spectral magnitudes using an interpolation matrix that minimize the weighted interpolation error. The paper describes a novel vector quantization technique, wherein the quantized vector is obtained by applying a linear transformation selected from a first codebook to a codevector selected from a second codebook. The transformation is selected from a family of linear transformations, represented by a matrix codebook.
Vectors in the second codebook are called residual codevectors.
In order to avoid high complexity during the search for the best linear transformation, each linear transformation is assigned a representative vector, such that the search can be done employing the representative vectors. The VQ design algorithm is based on joint optimization of the linear transformation and the residual codebooks.
The introduced techniques are general enough to be used in any sinusoidal speech coding scheme. In this work we incorporated the techniques into the matching pursuits sinusoidal model to achieve high quality speech using sinusoidal speech coder at 4 kbps. Subjective tests indicate that the proposed coder at 4 kbps has quality comparable to that of G.729 at 8kbps.
1:00, SPEECH-P1.7
SPEECH LSF QUANTIZATION WITH RATE INDEPENDENT COMPLEXITY, BIT SCALABILITY AND LEARNING.
A. SUBRAMANIAM, B. RAO
A computationally efficient, high quality, vector quantization scheme
based on a parametric probability density function (PDF) is proposed.
In this scheme, speech line spectral frequencies are modeled as i.i.d realizations of a multivariate Gaussian mixture density. The mixture model parameters are efficiently estimated using the Expectation Maximization (EM) algorithm. An efficient quantization scheme using transform coding and bit allocation techniques which allows for easy and computationally efficient mapping from observation
to quantized value is developed for both fixed rate and variable rate systems. An attractive feature of this method is
that source encoding using the resultant codebook involves very few searches and its computational complexity is minimal and independent of the rate of the system. Furthermore, the proposed scheme
is bit scalable and can switch between memoryless and quantizer with memory seamlessly. The performance of the memoryless
quantizer is 2-3 bits better than conventional quantization schemes.
1:00, SPEECH-P1.8
THE SMV ALGORITHM SELECTED BY TIA AND 3GPP2 FOR CDMA APPLICATIONS
Y. GAO, A. BENYASSINE, J. THYSSEN, H. SU, C. MURGIA, E. SHLOMOT
During the years 1999 and 2000, the Telecommunication Industry Association (TIA) and the 3rd Generation Partnership Project 2 (3GPP2), managed a competition and a selection process for a new speech coding standard for CDMA applications. The new speech coding standard, which is coined Selectable Mode Vocoder (SMV), will become a service option in CDMA systems such as IS-95 and cdma2000, providing a higher quality, flexibility, and capacity over the existing speech coding service options, IS-96C, IS-127, and IS-733. Eight companies submitted candidates to selection phase. For all of the 36 test conditions, Conexant SMV candidate was ranked at the top, or was statistically equivalent to the top-ranking candidate. Conexant SMV candidate was chosen as the core speech coding technology for the SMV system. This paper describes the SMV algorithm developed by Conexant.
1:00, SPEECH-P1.9
UNIVERSAL SUCCESSIVE REFINEMENT OF CELP SPEECH CODERS
H. DONG, J. GIBSON
Many speech coding standards are based upon code-excited linear prediction (CELP), and it is desirable to develop layered coding methods that are compatible with this installed base of coders. We propose a layered speech coding structure that is universally compatible with all CELP-based coders. This structure encodes the reconstruction error signal from layer 1 using a low-delay, adaptive tree coder based upon the mean squared error (MSE) criterion. We note that rate distortion optimal successive refinement is achievable using two different distortion criteria and we derive expressions for the rate distortion function under autoregressive Gaussian assumptions on the source and the two different distortion measures. We demonstrate the universality of the approach by developing two-layer coders for a 3.65 kbps CELP coder, G.723.1, and G.729. We show that our layering method is favorably competitive with the MPEG-4 layering method at 8.7 kbps for both clean and noisy speech. Using tree coding and the MSE criterion in layer 2 improves speech naturalness when coding noisy speech.
1:00, SPEECH-P1.10
SEW REPRESENTATION FOR LOW RATE WI CODING
J. LUKASIAK, I. BURNETT
This paper considers low-rate Waveform Interpolation (WI) coding. It compares the existing, common Slowly Evolving Waveform (SEW) quantisation scheme with two new schemes for representing and quantising the SEW. The first scheme uses a minimum phase estimate to reconstruct the SEW whilst the second scheme uses a pulse model whose parameters are implicitly transmitted in the quantised rapidly evolving waveform (REW). These new schemes maintain or reduce the bit rate required for transmission of the SEW. Results indicate that, for low rate WI coding, necessarily coarse SEW magnitude spectrum quantisation limits the contribution of the SEW to perceptual quality. Perceptual tests indicate that avoiding coarse spectral shape quantisation and using a fixed shape model that lends itself to smooth interpolation, maintains the perceptual quality of the synthesized speech. The proposed fixed shape model requires no bits for transmission, allowing a 12 percent reduction in the overall coder bit rate.