Authors:
Mohammad Reza Nakhai,
Farokh A Marvasti,
Page (NA) Paper number 1257
Abstract:
In this paper, we discuss the split band code-excited linear prediction
(SB-CELP) speech coder which employs an iterative version of the harmonic
sinusoidal coding algorithm to encode the periodic contents of speech
signal. Speech spectrum is split into two frequency regions of harmonic
and random components and a reliable fundamental frequency is estimated
for the harmonic region using both speech and its linear predictive
(LP) residual spectrum. The resulting sinusoidal parameters are interpolated
to reconstruct the periodicity in speech waveform. The level of periodicity
is controlled by computing a cutoff frequency between the harmonic
and random regions of spectrum. The random part of spectrum and unvoiced
speech are processed using the CELP coding algorithm. The SB-CELP speech
coder which combines the powerful features of the sinusoidal and CELP
coding algorithms yields a high quality synthetic speech at 4.05 kb/s.
Authors:
Najam Malik, School of Electrical Engineering, University of New South Wales, Australia. (Australia)
W. Harvey Holmes, School of Electrical Engineering, University of New South Wales, Australia. (Australia)
Page (NA) Paper number 1278
Abstract:
We present an algorithm for all-pole (envelope) modeling of the amplitudes
of sinusoids present in voiced speech segments which works even when
the number of sinusoids is very small, as occurs with high-pitched
speakers. In contrast to previous methods, this algorithm minimizes
a squared error criterion in the log amplitude domain rather than the
amplitude domain, and so is better matched to the properties of the
human auditory system. A weighted iterative approach is used to get
near optimal solutions to this otherwise nonlinear problem. This new
frequency domain log amplitude modeling (LAM) algorithm gives impressive
results, especially in the case of high pitched female voices where
conventional linear prediction methods are inadequate. The algorithm
can easily be generalized to develop pole-zero models.
Authors:
Minoru Kohata,
Page (NA) Paper number 1356
Abstract:
In this paper, a very low bit speech coder at 1.2 kbps is newly proposed.
Like the LPC vocoder, it only requires gain, pitch, and spectral information,
but its quality is far superior. The synthesis method is one of harmonic
coding, using sinusoids whose frequencies are multiples of the fundamental
frequency, where the amplitudes of the sinusoids are adaptively modulated
using Gammatone filters as a perceptual weighting filter. The sinusoids'
phases are also adjusted so as to maximize the perceptual quality.
In order to reduce the total bit rate to 1.2 kbit/s, a new segment
coder for spectral information (LSP coefficients) using DP matching
is also proposed. The quality of the synthesized speech was improved
by 0.45 in the Mean Opinion Score (MOS) compared with that of the simple
LPC vocoder operating at the same rate, and it was comparable to that
of 2.4kbit/s MELP coder.
Authors:
Jesper Jensen,
Søren Holdt Jensen,
Egon Hansen,
Page (NA) Paper number 1446
Abstract:
A generalized sinusoidal model for speech signal processing is studied.
The main feature of the model is that the amplitude of each sinusoidal
component is allowed to vary exponentially with time. We propose to
use the model in transitional speech segments such as speech onsets
and voiced/unvoiced transitions. Computer simulations with natural
speech signals indicate substantial better modeling performance in
both transitional and voiced regions compared with the traditional
constant-amplitude sinusoidal model.
Authors:
Eric W. M. Yu, City University of Hong Kong (Hong Kong)
Cheung-Fat Chan, City University of Hong Kong (Hong Kong)
Page (NA) Paper number 1596
Abstract:
This paper presents a harmonic+noise speech coder which uses an efficient
spectral quantization technique and a novel voiced/unvoiced (V/UV)
mixing model. The harmonic magnitudes are coded at 23 bits/frame using
the magnitude response of a linear predictive coding (LPC) system.
The difference between the harmonic magnitudes and the sampled magnitude
response is minimized by the closed-loop approach. The V/UV mixing
is modeled by a smooth function which is derived from the speech spectrum
envelope based on the flatness measure. The V/UV mixing model allows
noise to be added in the harmonic portion of speech spectrum so that
buzzyness is reduced. The V/UV mixing information is determined from
the spectral parameters available in the decoder, no bits are needed
for transmitting the V/UV information. A 1.4 kbps harmonic coder is
developed. The speech quality of the coder is comparable to other harmonic
coders operating at higher rates.
Authors:
Suat Yeldener, COMSAT Laboratories, Clarkburg, Maryland, USA (USA)
Page (NA) Paper number 1731
Abstract:
The Harmonic Excitation Linear Predictive Speech Coder (HE-LPC) is
a technique derived from MBE and MB-LPC type of speech coding algorithms.
The HE-LPC coder has the potential of producing high quality speech
at 4.8 kb/s and below. This coder employs a new pitch estimation and
voicing technique. In addition, new DCT based LPC and residual amplitude
quantization techniques have been developed. The 4 kb/s HE-LPC coder
with a 14th order LPC filter was found to produce much better speech
quality than the various low rate speech coding standards, including
3.6 kb/s INMARSAT Mini-M AMBE vocoder. During formal ITU ACR test,
the 4 kb/s HE-LPC vocoder was found to produced equivalent performance
to 32 kb/s ADPCM and G.729 for both flat and modified IRS filtered
clean input speech conditions. The HE-LPC algorithm can also be extended
to cover bit rates between 1.2 and 8 kb/s range depending on the application.
Authors:
Jacek Stachurski,
Alan V McCree,
Vishu R Viswanathan,
Page (NA) Paper number 2072
Abstract:
Recently, a number of coding techniques have been reported to achieve
near toll quality synthesized speech at bit-rates around 4 kb/s. These
include variants of Code Excited Linear Prediction (CELP), Sinusoidal
Transform Coding (STC) and Multi-Band Excitation (MBE). While CELP
has been an effective technique for bit-rates above 6 kb/s, STC, MBE,
Waveform Interpolation (WI) and Mixed Excitation Linear Prediction
(MELP) models seem to be attractive at bit-rates below 3 kb/s. In this
paper, we present a system to encode speech with high quality using
MELP, a technique previously demonstrated to be effective at bit-rates
of 1.6--2.4 kb/s. We have enhanced the MELP model producing significantly
higher speech quality at bit-rates above 2.4 kb/s. We describe the
development and testing of a high quality 4 kb/s MELP coder.
Authors:
Thomas Eriksson,
Hong-Goo Kang,
Page (NA) Paper number 2329
Abstract:
This paper describes a new pitch quantization method for low bit-rate
speech coding systems. The logarithm of the pitch period is quantized
in a combination of two uniform quantizers, one working directly on
logarithmic pitch values and the other working on the difference between
current and previous logarithmic pitch. The best of the two output
values is transmitted to the receiver. This scheme can exploit both
redundancy in the signal and properties of the ear to achieve an efficient
quantization. Listening tests show that the proposed scheme allows
the pitch parameter to be quantized using 4 bits, with no degradation
in audible quality.
|