Authors:
Simao F Campos Neto,
Franklin L Corcoran,
Ara Karahisar,
Page (NA) Paper number 1313
Abstract:
The growth and increased competition in the second-generation (digital)
cellular communication market has led service providers to improve
the speech quality in their systems by introducing enhanced speech
coders. Advancements in speech coding allowed designers to aim at toll-quality
for these enhanced coders, and investigation of the impact of speech
coders on the end-to-end quality of the public switched telephone network
(PSTN) is necessary. This paper presents the continuation of a series
of studies on the impact of tandem connection of cellular systems,
where the quality of the enhanced cellular coders for major systems
in use today is studied in the context of PSTN interconnection. A major
conclusion of this study is that deployment of enhanced coders in second-generation
cellular systems makes possible a substantial increase in quality of
the cellular connections when in tandem with other speech coders in
long haul international networks.
Authors:
Ki-Seung Lee,
Richard V. Cox,
Page (NA) Paper number 1457
Abstract:
This paper addresses a speech coder which uses a Text-To-Speech (TTS)
synthesis system to achieve very low bit rates (sub 1kbps). The main
issue of the work is the accurate coding of the pitch(F0) and gain
contours which are principle components of prosody. This is of paramount
interest since the correct prosody will increase naturalness and an
efficient coding scheme will provide high coding gain. Together with
the phonetic transcription, the F0 and gain contour constitute the
parameters that are necessary for the TTS system to synthesize the
speech signal. Piecewise linear approximation is used to code the F0
parameter. A technique which minimizes bit rate while maintaining F0
error below a given threshold are described. To obtain both high compression
and smoothly changing gain contours, the variance of the signal is
averaged over each half phoneme length is transmitted as gain information.
With single speaker stimuli, and a priori text transcription information,
we obtained naturally sounding speech at an average bit rate of about
300 bps.
Authors:
Ling Kok Ng, School of EEE, Nanyang Technological University, Singapore 639798 (Singapore)
Gang Li, School of EEE, Nanyang Technological University, Singapore 639798 (Singapore)
Xiao Lin, Center for Signal Processing, Nanyang Technological University, Singapore 639798 (Singapore)
Guoan Bi, School of EEE, Nanyang Technological University, Singapore 639798 (Singapore)
Page (NA) Paper number 1516
Abstract:
In this paper, we propose an instantaneous amplitude (IA) based model
for speech signal representation. This can avoid the difficulty in
dealing with the time-varying phases and allows us to perform an optimization
procedure easily such that the synthetic signal can be made as close
to the original one as possible. A simplified frequency-picking algorithm
is derived to shorten the processing time while still maintaining the
quality of the synthetic speech. Experiments show that the synthetic
speech with the developed technique is of toll quality and almost perceptually
indistinguishable from the original speech. Initiate work on the coding
of the parameters, for a 16kHz sampled speech, for the IA model is
done and a toll quality synthesized speech at a bit rate of 40kbps
is achieved.
Authors:
Kazunori Ozawa,
Page (NA) Paper number 1656
Abstract:
This paper proposes an MP-CELP (Multi-Pulse-based CELP) speech coding
at 4 kb/s. In MP-CELP, amplitudes or signs of multi-pulse excitation
are simultaneously vector quantized (VQ). In order to improve speech
quality for background noise conditions, excitation signal is switched
between voiced and unvoiced speech, and the number of pulse is greatly
increased for unvoiced speech by restricting pulse locations. Further,
in order to improve voiced speech quality, the optimal combination
among adaptive codebook lag, pulse location, sign codevector and gain
codevector is selected which minimizes distortion by employing delayed-decision
search. The subjective evaluation results show that speech quality
for 4 kb/s MP-CELP is close to that for ITU-T G.723.1 (6.3 kb/s) and
G.729 (8 kb/s) in M-IRS clean speech condition. For background noise
conditions, the introduction for the excitation switching and the pulse
location restriction significantly improves MOS value by 0.4. However,
further improvement is still required, except for interference talker
condition.
Authors:
Erdal Paksoy,
Juan Carlos De Martin,
Alan V McCree,
Christian G Gerlach,
Anand Anandakumar,
Wai-Ming Lai,
Vishu R Viswanathan,
Page (NA) Paper number 1986
Abstract:
We have developed an adaptive multi-rate (AMR) speech coder designed
to operate under the GSM digital cellular full rate (22.8 kb/s) and
half rate (11.4 kb/s) channels and to maintain high quality in the
presence of highly varying background noise and channel conditions.
Within each total rate, several codec modes with different source/channel
bit rate allocations are used. The speech coders in each codec mode
are based on the CELP algorithm operating at rates ranging from 11.85
kb/s down to 5.15 kb/s, where the lowest rate coder is a source controlled
multi-modal speech coder. The decoders monitor channel quality at both
ends of the wireless link using the soft values for the received bits
and assist the base station in selecting the codec mode that is appropriate
for a given channel condition. The coder was submitted to the GSM AMR
standardization competition and met the qualification requirements
in an independent formal MOS test.
Authors:
Azhar Mustapha, COMSAT Laboratories, Clarksburg, Maryland, USA (USA)
Suat Yeldener, COMSAT Laboratories, Clarksburg, Maryland, USA (USA)
Page (NA) Paper number 2024
Abstract:
This paper presents an adaptive time-domain post-filtering technique
based on the modified Yule-Walker filter. Conventionally, post-filtering
is derived from an original LPC spectrum. In general, this time-domain
technique produces unpredictable spectral tilt that is hard to control
by the modified LPC synthesis, inverse and high pass filtering and
causes unnecessary attenuation or amplification of some frequency components
that introduces muffling in speech quality. This effect increases when
voice coders are tandemed together. Another approach of designing a
post-filter was developed by McAulay and Quatieri which can only be
used in sinusoidal based speech coders. We have also developed another
new time-domain post-filtering technique. This technique eliminates
the problem of spectral tilt in speech spectrum that can be applied
to various speech coders. The new post-filter has a flat frequency
response at the formant peaks of speech spectrum. Instead of looking
at the modified LPC synthesis, inverse, and high pass filtering in
the conventional time-domain technique, we gather information about
the poles of the LPC spectrum in the new technique. This post-filtering
technique has been used in a 4 kb/s Harmonic Excitation Linear Predictive
Coder (HE-LPC) and a subjective listening tests have indicated that
this technique outperforms the conventional one in both one and two
tandem connections.
Authors:
Anthony J Accardi,
Richard V. Cox,
Page (NA) Paper number 2099
Abstract:
Ephraim and Malah's MMSE-LSA speech enhancement algorithm, while robust
and effective, is difficult to tune and adjust for the tradeoff between
noise reduction and distortion. We suggest a means of generalizing
this design, which allows for other estimators besides the MMSE-LSA
to be used within the same supporting framework. When a modified version
of Ephraim and Van Trees's spectral domain constrained signal subspace
estimator is used in this manner, we obtain a system with greater flexibility
and similar performance. We also explore the possibility of using different
speech enhancement techniques as pre-processors for different parameter
extraction modules of the IS-641 speech coder. We show that such a
strategy can increase the quality of the coded speech and lead to a
system that is more robust to differing noise types.
Authors:
Gernot Kubin,
W. Bastiaan Kleijn,
Page (NA) Paper number 2327
Abstract:
In many speech coders, the distortion criterion operates on the speech
signal or a signal obtained by adaptive linear filtering of the speech
signal. To satisfy computational and delay constraints, the distortion
criterion must be reduced to a very simple approximation of the auditory
system. This drawback of conventional approaches motivates a new speech
coding paradigm in which the coding is performed in a domain where
the single-letter squared-error criterion forms an accurate representation
of perception. The new paradigm requires a model of the auditory periphery
which is accurate, can be be inverted with relatively low computational
effort, and which represents the signal with relatively few parameters.
In this paper we develop such a model of the auditory periphery and
discuss its suitability for speech coding. Our results indicate that
the new paradigm in general and our auditory model in particular form
a promising basis for speech and audio coding.
|