Home
 Mirror Sites
 General Information
 Confernce Schedule
 Technical Program
 Tutorials
 Industry Technology Tracks
 Exhibits
 Sponsors
 Registration
 Coming to Phoenix
 Call for Papers
 Author's Kit
 On-line Review
 Future Conferences
 Help
|
Abstract: Session SP-15 |
|
SP-15.1
|
Split Band CELP (SB-CELP) Speech Coder
Mohammad R Nakhai,
Farokh A Marvasti (King's College, University of London)
In this paper, we discuss the split band code-excited
linear prediction (SB-CELP) speech coder which employs
an iterative version of the harmonic sinusoidal coding
algorithm to encode the periodic contents of speech
signal. Speech spectrum is split into two frequency
regions of harmonic and random components and a
reliable fundamental frequency is estimated for the
harmonic region using both speech and its linear
predictive (LP) residual spectrum. The resulting
sinusoidal parameters are interpolated to reconstruct
the periodicity in speech waveform. The level of
periodicity is controlled by computing a cutoff
frequency between the harmonic and random regions of
spectrum. The random part of spectrum and unvoiced
speech are processed using the CELP coding algorithm.
The SB-CELP speech coder which combines the powerful
features of the sinusoidal and CELP coding algorithms
yields a high quality synthetic speech at 4.05 kb/s.
|
SP-15.2
|
Log Amplitude Modeling of Sinusoids in Voiced Speech
Najam Malik,
W. Harvey Holmes (School of Electrical Engineering, University of New South Wales, Australia.)
We present an algorithm for all-pole (envelope) modeling of the amplitudes of sinusoids present in voiced speech segments which works even when the number of sinusoids is very small, as occurs with high-pitched speakers. In contrast to previous methods, this algorithm minimizes a squared error criterion in the log amplitude domain rather than the amplitude domain, and so is better matched to the properties of the human auditory system. A weighted iterative approach is used to get near optimal solutions to this otherwise nonlinear problem. This new frequency domain log amplitude modeling (LAM) algorithm gives impressive results, especially in the case of high pitched female voices where conventional linear prediction methods are inadequate. The algorithm can easily be generalized to develop pole-zero models.
|
SP-15.3
|
1.2kbit/s Harmonic Coder Using Auditory Filters
Minoru Kohata (Chiba Institute of Technology)
In this paper, a very low bit speech coder at 1.2 kbps is newly
proposed. Like the LPC vocoder, it only requires gain, pitch,
and spectral information, but its quality is far superior.
The synthesis method is one of harmonic coding, using sinusoids
whose frequencies are multiples of the fundamental frequency,
where the amplitudes of the sinusoids are adaptively modulated
using Gammatone filters as a perceptual weighting filter. The
sinusoids' phases are also adjusted so as to maximize the
perceptual quality. In order to reduce the total bit rate to
1.2 kbit/s, a new segment coder for spectral information (LSP
coefficients) using DP matching is also proposed. The quality
of the synthesized speech was improved by 0.45 in the Mean
Opinion Score (MOS) compared with that of the simple LPC vocoder
operating at the same rate, and it was comparable to that of
2.4kbit/s MELP coder.
|
SP-15.4
|
Exponential Sinusoidal Modeling of Transitional Speech Segments
Jesper Jensen,
Søren H Jensen,
Egon Hansen (CPK, Aalborg University)
A generalized sinusoidal model for speech signal processing is
studied. The main feature of the model is that the amplitude of each
sinusoidal component is allowed to vary exponentially with time. We
propose to use the model in transitional speech segments such as
speech onsets and voiced/unvoiced transitions. Computer simulations
with natural speech signals indicate substantial better modeling
performance in both transitional and voiced regions compared with the
traditional constant-amplitude sinusoidal model.
|
SP-15.5
|
Harmonic+Noise Coding Using Improved V/UV Mixing and Efficient Spectral Quantization
Eric W. M. Yu,
Cheung-Fat Chan (City University of Hong Kong)
This paper presents a harmonic+noise speech coder which uses an efficient spectral quantization technique and a novel voiced/unvoiced (V/UV) mixing model. The harmonic magnitudes are coded at 23 bits/frame using the magnitude response of a linear predictive coding (LPC) system. The difference between the harmonic magnitudes and the sampled magnitude response is minimized by the closed-loop approach. The V/UV mixing is modeled by a smooth function which is derived from the speech spectrum envelope based on the flatness measure. The V/UV mixing model allows noise to be added in the harmonic portion of speech spectrum so that buzzyness is reduced. The V/UV mixing information is determined from the spectral parameters available in the decoder, no bits are needed for transmitting the V/UV information. A 1.4 kbps harmonic coder is developed. The speech quality of the coder is comparable to other harmonic coders operating at higher rates.
|
SP-15.6
|
A 4 Kb/s Toll Quality Harmonic Excitation Linear Predictive Speech Coder
Suat Yeldener (COMSAT Laboratories, Clarkburg, Maryland, USA)
The Harmonic Excitation Linear Predictive Speech Coder
(HE-LPC) is a technique derived from MBE and MB-LPC
type of speech coding algorithms. The HE-LPC coder has
the potential of producing high quality speech at 4.8
kb/s and below. This coder employs a new pitch
estimation and voicing technique. In addition, new DCT
based LPC and residual amplitude quantization
techniques have been developed. The 4 kb/s HE-LPC coder
with a 14th order LPC filter was found to produce much
better speech quality than the various low rate speech
coding standards, including 3.6 kb/s INMARSAT Mini-M
AMBE vocoder. During formal ITU ACR test, the 4 kb/s
HE-LPC vocoder was found to produced equivalent
performance to 32 kb/s ADPCM and G.729 for both flat
and modified IRS filtered clean input speech conditions.
The HE-LPC algorithm can also be extended to cover bit
rates between 1.2 and 8 kb/s range depending on the
application.
|
SP-15.7
|
High Quality MELP Coding at Bit-Rates Around 4 kb/s
Jacek Stachurski,
Alan McCree,
Vishu Viswanathan (Texas Instruments)
Recently, a number of coding techniques have been reported to achieve near
toll quality synthesized speech at bit-rates around 4 kb/s.
These include variants of Code Excited Linear Prediction (CELP), Sinusoidal
Transform Coding (STC) and Multi-Band Excitation (MBE).
While CELP has been an effective technique for bit-rates above 6 kb/s, STC,
MBE, Waveform Interpolation (WI) and Mixed Excitation Linear Prediction (MELP)
models seem to be attractive at bit-rates below 3 kb/s.
In this paper, we present a system to encode speech with high quality using
MELP, a technique previously demonstrated to be effective at bit-rates of
1.6--2.4 kb/s.
We have enhanced the MELP model producing significantly higher speech
quality at bit-rates above 2.4 kb/s.
We describe the development and testing of a high quality 4 kb/s MELP coder.
|
SP-15.8
|
Pitch Quantization in Low Bit-Rate Speech Coding
Thomas Eriksson,
Hong-Goo Kang (AT&T Labs Research, SIPS, 180 Park Avenue, Florham Park, NJ07932)
This paper describes a new pitch quantization method for
low bit-rate speech coding systems.
The logarithm of the pitch period is quantized in a combination of
two uniform quantizers, one
working directly on logarithmic pitch values and the other working
on the difference between current and previous logarithmic pitch.
The best of the two output values is transmitted to the receiver.
This scheme can exploit both redundancy in the signal and properties
of the ear to achieve an efficient quantization.
Listening tests show that the proposed scheme allows the pitch
parameter to be quantized using 4 bits, with no degradation in audible
quality.
|
|