Home
 Mirror Sites
 General Information
 Confernce Schedule
 Technical Program
 Tutorials
 Industry Technology Tracks
 Exhibits
 Sponsors
 Registration
 Coming to Phoenix
 Call for Papers
 Author's Kit
 On-line Review
 Future Conferences
 Help
|
Abstract: Session SP-6 |
|
SP-6.1
|
Performance assessment of tandem connection of enhanced cellular coders
Simao F Campos Neto,
Franklin L Corcoran (COMSAT Laboratories),
Ara Karahisar (Teleglobe International)
The growth and increased competition in the
second-generation (digital) cellular communication
market has led service providers to improve the
speech quality in their systems by introducing
enhanced speech coders. Advancements in speech
coding allowed designers to aim at toll-quality
for these enhanced coders, and investigation of
the impact of speech coders on the end-to-end
quality of the public switched telephone network
(PSTN) is necessary. This paper presents the
continuation of a series of studies on the impact
of tandem connection of cellular systems, where
the quality of the enhanced cellular coders for
major systems in use today is studied in the
context of PSTN interconnection. A major
conclusion of this study is that deployment of
enhanced coders in second-generation cellular
systems makes possible a substantial increase in
quality of the cellular connections when in tandem
with other speech coders in long haul
international networks.
|
SP-6.2
|
TTS based very low bit rate speech coder
Ki-Seung Lee,
Richard V. Cox (AT&T Labs-Research, SIPS Lab)
This paper addresses a speech coder which uses a Text-To-Speech (TTS)
synthesis system to achieve very low bit rates (sub 1kbps).
The main issue of the work is the accurate coding of the pitch(F0) and gain
contours which are principle components of prosody.
This is of paramount interest since
the correct prosody will increase naturalness and an efficient coding scheme will
provide high coding gain. Together with the phonetic transcription,
the F0 and gain contour constitute the parameters that are necessary
for the TTS system to synthesize the speech signal.
Piecewise linear approximation is used to code the F0 parameter.
A technique which minimizes bit rate while maintaining F0 error
below a given threshold are described.
To obtain both high compression and smoothly changing gain contours,
the variance of the signal is averaged over each half phoneme length is
transmitted as gain information.
With single speaker stimuli, and a priori text transcription information,
we obtained naturally sounding speech at an average bit rate of about 300 bps.
|
SP-6.3
|
WIDEBAND SPEECH CODING WITH TOLL QUALITY BASED ON IA-MODEL
Ling Kok Ng,
Gang Li (School of EEE, Nanyang Technological University, Singapore 639798),
Xiao Lin (Center for Signal Processing, Nanyang Technological University, Singapore 639798),
Guoan Bi (School of EEE, Nanyang Technological University, Singapore 639798)
In this paper, we propose an instantaneous amplitude
(IA) based model for speech signal representation.
This can avoid the difficulty in dealing with the
time-varying phases and allows us to perform an
optimization procedure easily such that the synthetic
signal can be made as close to the original one as
possible. A simplified frequency-picking algorithm
is derived to shorten the processing time while still
maintaining the quality of the synthetic speech.
Experiments show that the synthetic speech with the
developed technique is of toll quality and almost
perceptually indistinguishable from the original
speech. Initiate work on the coding of the parameters,
for a 16kHz sampled speech, for the IA model is done
and a toll quality synthesized speech at a bit rate
of 40kbps is achieved.
|
SP-6.4
|
4 kb/s Multi-Pulse Based CELP Speech Coding Using Excitation Switching
Kazunori Ozawa (NEC Corporation)
This paper proposes an MP-CELP (Multi-Pulse-based CELP)
speech coding at 4 kb/s. In MP-CELP, amplitudes or signs of
multi-pulse excitation are simultaneously vector quantized (VQ).
In order to improve speech quality for background noise conditions,
excitation signal is switched between voiced and unvoiced speech,
and the number of pulse is greatly increased for unvoiced speech
by restricting pulse locations. Further, in order to improve voiced
speech quality, the optimal combination among adaptive codebook lag,
pulse location, sign codevector and gain codevector is selected
which minimizes distortion by employing delayed-decision search.
The subjective evaluation results show that speech quality
for 4 kb/s MP-CELP is close to that for ITU-T G.723.1 (6.3 kb/s)
and G.729 (8 kb/s) in M-IRS clean speech condition.
For background noise conditions, the introduction for the excitation
switching and the pulse location restriction significantly improves MOS
value by 0.4. However, further improvement is still required, except
for interference talker condition.
|
SP-6.5
|
An Adaptive Multi-Rate Speech Coder For Digital Cellular Telephony
Erdal Paksoy (Texas Instruments),
Juan Carlos De Martin (Polytechnic of Turin),
Alan V McCree (Texas Instruments),
Christian G Gerlach (Alcatel SEL AG),
Anand Anandakumar,
Wai-Ming Lai,
Vishu Viswanathan (Texas Instruments)
We have developed an adaptive multi-rate (AMR) speech coder
designed to operate under the GSM digital cellular full rate (22.8
kb/s) and half rate (11.4 kb/s) channels and to maintain high quality
in the presence of highly varying background noise and channel
conditions. Within each total rate, several codec modes with
different source/channel bit rate allocations are used. The speech
coders in each codec mode are based on the CELP algorithm
operating at rates ranging from 11.85 kb/s down to 5.15 kb/s, where
the lowest rate coder is a source controlled multi-modal speech
coder. The decoders monitor channel quality at both ends of the
wireless link using the soft values for the received bits and assist
the base station in selecting the codec mode that is appropriate for
a given channel condition. The coder was submitted to the GSM
AMR standardization competition and met the qualification
requirements in an independent formal MOS test.
|
SP-6.6
|
An Adaptive Post-Filtering Technique Based on The Modified Yule-Walker Filter
Azhar Mustapha,
Suat Yeldener (COMSAT Laboratories, Clarksburg, Maryland, USA)
This paper presents an adaptive time-domain
post-filtering technique based on the modified
Yule-Walker filter. Conventionally, post-filtering is
derived from an original LPC spectrum. In general,
this time-domain technique produces unpredictable
spectral tilt that is hard to control by the
modified LPC synthesis, inverse and high pass filtering
and causes unnecessary attenuation or amplification of
some frequency components that introduces muffling in
speech quality. This effect increases when voice coders
are tandemed together. Another approach of designing a
post-filter was developed by McAulay and Quatieri which
can only be used in sinusoidal based speech coders.
We have also developed another new time-domain
post-filtering technique. This technique eliminates the
problem of spectral tilt in speech spectrum that can
be applied to various speech coders. The new
post-filter has a flat frequency response at the
formant peaks of speech spectrum. Instead of looking
at the modified LPC synthesis, inverse, and high pass
filtering in the conventional time-domain technique, we
gather information about the poles of the LPC
spectrum in the new technique. This post-filtering
technique has been used in a 4 kb/s Harmonic Excitation
Linear Predictive Coder (HE-LPC) and a subjective
listening tests have indicated that this technique
outperforms the conventional one in both one and two
tandem connections.
|
SP-6.7
|
A Modular Approach to Speech Enhancement with an Application to Speech Coding
Anthony J Accardi,
Richard V Cox (AT&T Labs - Research, Florham Park, NJ 07932)
Ephraim and Malah's MMSE-LSA speech enhancement algorithm, while robust
and effective, is difficult to tune and adjust for the tradeoff between noise
reduction and distortion. We suggest a means of generalizing this
design, which allows for other estimators besides the MMSE-LSA to be used
within the same supporting framework. When a modified version of Ephraim
and Van Trees's spectral domain constrained signal subspace estimator is
used in this manner, we obtain a system with greater flexibility and
similar performance. We also explore the possibility of using different
speech enhancement techniques as pre-processors for different parameter
extraction modules of the IS-641 speech coder. We show that such a
strategy can increase the quality of the coded speech and lead to a
system that is more robust to differing noise types.
|
SP-6.8
|
On Speech Coding in a Perceptual Domain
Gernot Kubin (Vienna University of Technology),
W. Bastiaan Kleijn (KTH (Royal Institute of Technology))
In many speech coders, the distortion criterion operates on
the speech signal or a signal obtained by adaptive linear filtering of
the speech signal. To satisfy computational and delay constraints,
the distortion criterion must be reduced to a very simple
approximation of the auditory system. This drawback of
conventional approaches motivates a new speech coding paradigm in
which the coding is performed in a domain where the single-letter
squared-error criterion forms an accurate representation of
perception. The new paradigm requires a model of the auditory
periphery which is accurate, can be be inverted with relatively low
computational effort, and which represents the signal with relatively
few parameters. In this paper we develop such a model of the auditory
periphery and discuss its suitability for speech coding. Our results
indicate that the new paradigm in general and our auditory model in
particular form a promising basis for speech and audio coding.
|
|