Home
 Mirror Sites
 General Information
 Confernce Schedule
 Technical Program
 Tutorials
 Industry Technology Tracks
 Exhibits
 Sponsors
 Registration
 Coming to Phoenix
 Call for Papers
 Author's Kit
 On-line Review
 Future Conferences
 Help
|
Abstract: Session SP-8 |
|
SP-8.1
|
Low Delay Multi-level Decomposition and Quantisation Techniques for WI Coding
Nicola R Chong,
Ian S Burnett,
Joe F Chicharo (University of Wollongong)
For efficient coding of speech, it is desirable to separate the slowly and rapidly evolving spectral components to take advantage of their different perceptual qualities. In this paper, we present a multi-level wavelet decomposition mechanism, using low-delay FIR filters, applied to Waveform Interpolation coding. The technique overcomes the substantial delay problems of [2] and identifies a preferred technique for the quantisation of the decomposed surfaces. Phase is shown to be particularly sensitive to the compounding of quantisation errors within the tree-structured transform. The proposed solution involves the use of VDVQ on separately decomposed magnitude/phase surfaces. This approach provides for coarse or no phase quantisation while maintaining high speech quality. The techniques discussed may also be applied to other transforms and to the quantisation of surfaces in the standard Waveform Interpolation coder.
|
SP-8.2
|
An Improved Mixed Excitation Linear Prediction (MELP) Coder
Takahiro Unno,
Thomas P Barnwell III (Center for Signal and Image Processing, School of Electrical and Computer Engineering, Georgia Institute of Technology),
Kwan Truong (Atlanta Signal Processors, Inc.)
This paper presents an improved Mixed Excitation Linear
Prediction (MELP) coder.
The MELP is the linear-prediction-based speech coder
that was recently chosen as the new 2400 bps U.S.
Federal Standard. Even though the MELP is quite good,
there are still some perceivable distortions, particularly
around non-stationary speech segments and for some
low-pitch male speakers. The key features of our new
coder include a robust pitch detection algorithm, a new
plosive analysis/synthesis method, and a post processor
for the Fourier magnitude model. Formal quality tests
are used to show that the new MELP improves the quality
of the U.S. Federal Standard MELP coder while requiring
only a small increase in algorithmic delay and while
also retaining compatibility with the Federal Standard
MELP bit-stream specification.
|
SP-8.3
|
Split Band LPC Based Adaptive Multi-Rate GSM Candidate
Stephane Villette,
Milos Stefanovic,
Ahmet Kondoz (CCSR, University of Surrey, UK)
The European Telecommunications Standards Institute (ETSI) has launched
a competition for a new mobile communications standard designed to
provide better performance than the current GSM standard. This standard is
to be called AMR for Adaptive Multi-Rate: the source and channel coding
rates can be adapted depending on the state of the channel, thus providing
optimal balance between them at any time. The University of Surrey has
submitted a candidate for this competition through the Mobile VCE. This
candidate was the only one amongst eleven to use a vocoder in the half-rate
GSM channel instead of a CELP based coder. The testing which took place as
part of the first stage of the competition has shown that this candidate
was among the best. This paper presents the system submitted for the
half-rate channel as well as the results of the testing.
|
SP-8.4
|
Frequency-Domain Spectral Envelope Estimation for Low Rate Coding of Speech
Milan Jelinek,
Jean-Pierre Adoul (University of Sherbrooke)
Estimation of spectral envelope in frequency domain allows to avoid some problems of the Linear Prediction (LP) algorithms for voiced speech. We present a low complexity method of spectral envelope estimation from harmonics for low rate coding. The method consists in computing harmonic amplitude spectrum using pitch-synchronous DFT with length depending on voicing, modifying this spectrum outside the telephone bandwidth to simplify modeling of the useful bandwidth and interpolating it by a frequency-domain low-pass filter. An all-pole model is then fitted to this modified smoothed version of the harmonic spectrum. The method was implemented on the Harmonic-Stochastic Excitation (HSX) vocoder and the performance was compared with the LP algorithm similar to that used in the G.729 speech coding standard. A-B comparative tests show an important increase in perceptual quality.
|
SP-8.5
|
Robust Closed-Loop Pitch Estimation for Harmonic Coders by Time Scale Modification
Chunyan Li,
Vladimir Cuperman,
Allen Gersho (UCSB, ECE Dept.)
Harmonic coders that synthesize speech without transmitting phase information
abandon the benefits of closed-loop parameter estimation via waveform matching.
In this paper, we show that effective closed loop parameter estimation can be
achieved when a suitable time-scale modification is applied to the speech LP
residual in harmonic coders. The concept is demonstrated here specifically for
pitch estimation, but is more broadly applicable. For each of a set of pitch
candidates generated by a time-domain pitch estimator, the residual is modified
to match the pitch contour derived from that candidate. The best candidate is
selected by evaluating for each candidate the match between the modified
residual and the synthesized residual. The new pitch estimation algorithm
significantly reduces gross pitch errors compared to a conventional time-domain
pitch estimator and enhances the perceptual performance of a 4 kbps harmonic
coder.
|
SP-8.6
|
Phase adjustment in waveform interpolation
Hong-Goo Kang,
D. Sen (AT&T Labs-Research)
This paper describes a method of improving the quality of the Waveform
Interpolation (WI) speech coder by adjustment of the phase information.
In WI, a slowly-evolving waveform (SEW) and a rapidly-evolving waveform
(REW) represent the periodic and the non-periodic part of the signal.
The phase of the synthesized signal is determined by the SEW and REW,
and thus the correct quantization of these parameters are important for
producing natural speech quality.
A method is described, whereby the phase of the synthesized signal is
adjusted by modifying the quantized REW spectrum as a function of the
fundamental frequency. This essentialy attempts to correct the discrepancies
in phase that arise due to variation in pitch and also accounts for the
difference in noise sensitivity between female and male speech.
The overall effect would be the same if multiple codebooks (depending
on pitch) were used to code the REW spectrum. Experimental results confirm
that the new method results in significantly improved performance.
|
SP-8.7
|
A Low Resolution Pulse Position Coding Method for Improved Excitation Modeling of Speech Transition
Jongseo Sohn,
Wonyong Sung (School of Electrical Engineering, Seoul National University)
We propose a new excitation model for transitional speech to reduce the distortion due to the traditional two-excitation source, voiced and unvoiced, model. The proposed low resolution pulse position coding (LRPPC) algorithm detects the existence of pulses at frames of weak periodicity, which are determined as unvoiced, and transmits the approximate pulse positions. In the decoder, dispersed pulses that have a flat magnitude spectrum are synthesized at the decoded positions to form the excitation signal. A subjective quality test shows that the vocoder employing the LRPPC algorithm produces better quality of speech, and is very robust to mode decision errors.
|
SP-8.8
|
DISPERSION PHASE VECTOR QUANTIZATION FOR ENHANCEMENT OF WAVEFORM INTERPOLATIVE CODER
Oded Gottesman (Signal Compression Laboratory, Department of Electrical and Computer Engineering, University of California, Santa Barbara, CA 93106, USA)
This paper presents an efficient analysis-by-synthesis vector quantizer for the dispersion phase of the excitation signal which was used to enhance a waveform-interpolative coder. The scheme can be used to enhance other harmonic coders, such as the sinusoidal-transform coder and the multiband-excitation coder. The scheme incorporates perceptual weighting, and does not require any phase unwarping. The proposed quantizer achieves a segmental signal-to-noise ratio of up to 14dB for as low as 6-bit quantization. Subjective testing shows improvement in synthesized speech quality using the quantized phase over a male speaker extracted phase. The improvement was larger for female speakers.
|
|