Authors:
Nicola R Chong,
Ian S Burnett,
Joe F Chicharo,
Page (NA) Paper number 1419
Abstract:
For efficient coding of speech, it is desirable to separate the slowly
and rapidly evolving spectral components to take advantage of their
different perceptual qualities. In this paper, we present a multi-level
wavelet decomposition mechanism, using low-delay FIR filters, applied
to Waveform Interpolation coding. The technique overcomes the substantial
delay problems of [2] and identifies a preferred technique for the
quantisation of the decomposed surfaces. Phase is shown to be particularly
sensitive to the compounding of quantisation errors within the tree-structured
transform. The proposed solution involves the use of VDVQ on separately
decomposed magnitude/phase surfaces. This approach provides for coarse
or no phase quantisation while maintaining high speech quality. The
techniques discussed may also be applied to other transforms and to
the quantisation of surfaces in the standard Waveform Interpolation
coder.
Authors:
Takahiro Unno,
Thomas P Barnwell III,
Kwan Truong,
Page (NA) Paper number 1764
Abstract:
This paper presents an improved Mixed Excitation Linear Prediction
(MELP) coder. The MELP is the linear-prediction-based speech coder
that was recently chosen as the new 2400 bps U.S. Federal Standard.
Even though the MELP is quite good, there are still some perceivable
distortions, particularly around non-stationary speech segments and
for some low-pitch male speakers. The key features of our new coder
include a robust pitch detection algorithm, a new plosive analysis/synthesis
method, and a post processor for the Fourier magnitude model. Formal
quality tests are used to show that the new MELP improves the quality
of the U.S. Federal Standard MELP coder while requiring only a small
increase in algorithmic delay and while also retaining compatibility
with the Federal Standard MELP bit-stream specification.
Authors:
Stephane Villette, CCSR, University of Surrey, UK (U.K.)
Milos Stefanovic, CCSR, University of Surrey, UK (U.K.)
Ahmet Kondoz, CCSR, University of Surrey, UK (U.K.)
Page (NA) Paper number 1798
Abstract:
The European Telecommunications Standards Institute (ETSI) has launched
a competition for a new mobile communications standard designed to
provide better performance than the current GSM standard. This standard
is to be called AMR for Adaptive Multi-Rate: the source and channel
coding rates can be adapted depending on the state of the channel,
thus providing optimal balance between them at any time. The University
of Surrey has submitted a candidate for this competition through the
Mobile VCE. This candidate was the only one amongst eleven to use a
vocoder in the half-rate GSM channel instead of a CELP based coder.
The testing which took place as part of the first stage of the competition
has shown that this candidate was among the best. This paper presents
the system submitted for the half-rate channel as well as the results
of the testing.
Authors:
Milan Jelinek,
Jean-Pierre Adoul,
Page (NA) Paper number 1818
Abstract:
Estimation of spectral envelope in frequency domain allows to avoid
some problems of the Linear Prediction (LP) algorithms for voiced speech.
We present a low complexity method of spectral envelope estimation
from harmonics for low rate coding. The method consists in computing
harmonic amplitude spectrum using pitch-synchronous DFT with length
depending on voicing, modifying this spectrum outside the telephone
bandwidth to simplify modeling of the useful bandwidth and interpolating
it by a frequency-domain low-pass filter. An all-pole model is then
fitted to this modified smoothed version of the harmonic spectrum.
The method was implemented on the Harmonic-Stochastic Excitation (HSX)
vocoder and the performance was compared with the LP algorithm similar
to that used in the G.729 speech coding standard. A-B comparative tests
show an important increase in perceptual quality.
Authors:
Chunyan Li,
Vladimir Cuperman,
Allen Gersho,
Page (NA) Paper number 1855
Abstract:
Harmonic coders that synthesize speech without transmitting phase information
abandon the benefits of closed-loop parameter estimation via waveform
matching. In this paper, we show that effective closed loop parameter
estimation can be achieved when a suitable time-scale modification
is applied to the speech LP residual in harmonic coders. The concept
is demonstrated here specifically for pitch estimation, but is more
broadly applicable. For each of a set of pitch candidates generated
by a time-domain pitch estimator, the residual is modified to match
the pitch contour derived from that candidate. The best candidate is
selected by evaluating for each candidate the match between the modified
residual and the synthesized residual. The new pitch estimation algorithm
significantly reduces gross pitch errors compared to a conventional
time-domain pitch estimator and enhances the perceptual performance
of a 4 kbps harmonic coder.
Authors:
Hong-Goo Kang,
D. Sen,
Page (NA) Paper number 2043
Abstract:
This paper describes a method of improving the quality of the Waveform
Interpolation (WI) speech coder by adjustment of the phase information.
In WI, a slowly-evolving waveform (SEW) and a rapidly-evolving waveform
(REW) represent the periodic and the non-periodic part of the signal.
The phase of the synthesized signal is determined by the SEW and REW,
and thus the correct quantization of these parameters are important
for producing natural speech quality. A method is described, whereby
the phase of the synthesized signal is adjusted by modifying the quantized
REW spectrum as a function of the fundamental frequency. This essentialy
attempts to correct the discrepancies in phase that arise due to variation
in pitch and also accounts for the difference in noise sensitivity
between female and male speech. The overall effect would be the same
if multiple codebooks (depending on pitch) were used to code the REW
spectrum. Experimental results confirm that the new method results
in significantly improved performance.
Authors:
Jongseo Sohn,
Wonyong Sung,
Page (NA) Paper number 2269
Abstract:
We propose a new excitation model for transitional speech to reduce
the distortion due to the traditional two-excitation source, voiced
and unvoiced, model. The proposed low resolution pulse position coding
(LRPPC) algorithm detects the existence of pulses at frames of weak
periodicity, which are determined as unvoiced, and transmits the approximate
pulse positions. In the decoder, dispersed pulses that have a flat
magnitude spectrum are synthesized at the decoded positions to form
the excitation signal. A subjective quality test shows that the vocoder
employing the LRPPC algorithm produces better quality of speech, and
is very robust to mode decision errors.
Authors:
Oded Gottesman, Signal Compression Laboratory, Department of Electrical and Computer Engineering, University of California, Santa Barbara, CA 93106, USA (USA)
Page (NA) Paper number 1834
Abstract:
This paper presents an efficient analysis-by-synthesis vector quantizer
for the dispersion phase of the excitation signal which was used to
enhance a waveform-interpolative coder. The scheme can be used to enhance
other harmonic coders, such as the sinusoidal-transform coder and the
multiband-excitation coder. The scheme incorporates perceptual weighting,
and does not require any phase unwarping. The proposed quantizer achieves
a segmental signal-to-noise ratio of up to 14dB for as low as 6-bit
quantization. Subjective testing shows improvement in synthesized speech
quality using the quantized phase over a male speaker extracted phase.
The improvement was larger for female speakers.
|