9:30, SPEECH-P4.1
APPROXIMATING AND EXPLOITING THE RESIDUAL REDUNDANCIES -- APPLICATIONS TO EFFICIENT RECONSTRUCTION OF SPEECH OVER NOISY CHANNELS
F. LAHOUTI, A. KHANDANI
Exploiting the residual redundancy in a source coder output
stream during the decoding process has been proven to be a bandwidth
efficient way to combat the noisy channel degradations.
In this paper, we consider soft reconstruction of LSF parameters
in IS-641 CELP coder transmitted over a noisy channel.
We propose two schemes. The first scheme
attempts to exploit the interframe residual redundancies in the sequence of
received parameters. The second approach exploits both interframe and
intraframe residual redundancies. Simulation results are provided
which demonstrates the efficiency of the algorithms.
Another issue addressed here, is a methodology to efficiently approximate and store
the residual redundancies or the a priori transition probabilities.
For quantizers with high rates calculating these probabilities require
a huge number of source samples. As well, storing them require a
large amount of memory. These issues can well make the
decoder design process an impractical task.
The proposed method is based on the classification of the signal domain.
The presented schemes provide high quality error concealment solutions
for CELP coders.
9:30, SPEECH-P4.2
CHANNEL OPTIMIZED MATRIX QUANTIZATION (COMQ) FOR LSP PARAMETERS OVER WAVEFORM CHANNELS
J. PÉREZ-CÓRDOBA, A. RUBIO, J. LÓPEZ-SOLER, V. SÁNCHEZ
Combined source and channel coding is a technique to mitigate
channel errors without increasing the bit error rate. Channel
optimized vector quantizer (COVQ) performs
these objetives in the context of vector quantization.
This paper presents a study of channel optimized matrix quantizer
(COMQ) applied to quantize the Line Spectral Pair (LSP)
parameters as an extension of COVQ technique.
Gaussian and slow-fading Rayleigh channels are considered and
GMSK (Gaussian Minimum Shift-Keying) is used as modulation
technique. Several channel signal to noise ratio (CSNR) are considered to measure the performance of this system.
In addition, for comparison purposes, the performance of other
schemes for quantizing the LSP parameters are computed.
9:30, SPEECH-P4.3
HYBRID MULTI-MODE/MULTI-RATE CS-ACELP SPEECH CODING FOR ADAPTIVE VOICE OVER IP
G. RUGGERI, F. BERITELLI, S. CASALE
This paper presents a hybrid Multi-Mode/Multi-Rate, toll quality
CS-ACELP coder developed for Voice over IP applications. The coder
uses coding modes compatible with the three 6.4, 8, and 11.8
kbit/s coding schemes standardised by ITU-T in G.729. In
particular, the algorithm presents 4 coding categories, with an
average bit rate ranging between about 3 and 8 kbit/s, that adapt
the rate to changes in network conditions.
9:30, SPEECH-P4.4
IMPROVED VOICE ACTIVITY DETECTION BASED ON A SMOOTHED STATISTICAL LIKELIHOOD RATIO
Y. CHO, K. AL-NAIMI, A. KONDOZ
This paper presents the behavioural mechanism of a statistical model-based voice activity
detector (VAD), featuring a likelihood ratio test for the activity decision.
From investigation of the VAD, it is found that detection errors could occur frequently
at speech offset regions because of the delay term in the decision-directed
parameter estimator, employed for the estimation of an unknown parameter of the likelihood ratio.
Hence, this paper proposes a smoothed likelihood ratio so as to alleviate the detection errors at the offset region.
Objective test results show that the proposed scheme is useful for achieving a considerable performance
improvement for the VAD.
Additionally, the proposed VAD gives detection performances superior to G.729B VAD and comparable with AMR VAD option 2.
9:30, SPEECH-P4.5
MULTIPLEXED PREDICTIVE CODING OF SPEECH
S. ANDERSEN, G. KUBIN
In this paper we present a novel method for predictive coding with
application to transmission of speech over packet-switched
networks. Our method uses multiplexing to distribute a part of the
information about a segment of each speech signal in several data
packets while keeping the data packet rate and payload for that part
of the information unchanged. We investigate three multiplexing
schemes: a packet hopping, a Hadamard multiplexing, and an extension
of the Hadamard multiplexing that exploits a nonlinear preprocessing
and estimation method. We show by means of formal AB-preference
tests that multiplexed predictive coding can lead to coders that are
more robust to packet losses than scalar quantization and packet loss
concealment according to the G.711 standard.
9:30, SPEECH-P4.6
PARAMETER INTERPOLATION TO ENHANCE THE FRAME ERASURE ROBUSTNESS OF CELP CODERS IN PACKET NETWORKS
J. WANG, J. GIBSON
Frame erasure (FE) robustness is an important quality measure for voice over IP networks (VoIP). Recovery of the erased frames from the received information is crucial to realize this robustness. We allow the lost frames to be recovered from both the ``previous'' and ``next'' good frames. We first give quantitative distortion comparisons between predictive and interpolative frame recovery. Then we add FE-robust LSF coding modes to the popular ITU G.723.1 and G.729 CELP coders. These FE-robust modes utilize intraframe LSF VQ and invoke no bit-rate increase for the G.723.1 coder and a small increase (0.4 kb/s) for G.729. Simulations show that FE robust coding with interpolation achieves average spectral distortions 0.7-1.8 dB smaller than that of the original coders. Significant quality improvement was achieved by combined implementation of FE robust coding, LSF and pitch interpolation, and a proposed fixed codebook excitation recovery method.
9:30, SPEECH-P4.7
A SPEECH SPECTRUM DISTORTION MEASURE WITH INTERFRAME MEMORY
F. NORDÉN, T. ERIKSSON
In this paper we present a novel spectral distortion measure with interframe memory. The memory gives the possibility to take into account the dynamics of the time evolution of the speech spectrum, which has shown to have a significant importance on the perceived speech quality.
Memory is introduced by linear filtering of the time evolution of the
difference log spectrum. This facilitates smoothing of spectrum with a kept ability to track quick transitions.
Our results point at a substantially improved performance when rapidly
evolving spectrum errors are punished in the measure.
9:30, SPEECH-P4.8
ESTIMATION OF MISSING LSF PARAMETERS USING GAUSSIAN MIXTURE MODELS
R. MARTIN, C. HOELPER, I. WITTKE
Speech transmission over packet networks has to cope with packet delays and packet losses.
When a packet loss occurs the missing information must be estimated.
In this contribution we focus on restoring the spectral parameters of a speech coder.
A novel approach to estimating missing Line Spectral Frequency (LSF)
parameters using Gaussian Mixture Models (GMM) is proposed. We present
the estimation algorithm and study its performance when one or several LSF parameters are
lost. We show that a GMM of a relatively low order (approx. 20) is sufficient to achieve a
substantial improvement in parameter SNR. Therefore, the new estimation procedure requires much
less memory than histogram based estimation methods.
9:30, SPEECH-P4.9
PERCEPTUAL EVALUATION OF SPEECH QUALITY (PESQ) - A NEW METHOD FOR SPEECH QUALITY ASSESSMENT OF TELEPHONE NETWORKS AND CODECS
A. RIX, J. BEERENDS, M. HOLLIER, A. HEKSTRA
Previous objective speech quality assessment models, such as bark spectral distortion (BSD), the perceptual speech quality measure (PSQM), and measuring normalizing blocks (MNB), have been found to be suitable for assessing only a limited range of distortions. A new model has therefore been developed for use across a wider range of network conditions, including analogue connections, codecs, packet loss and variable delay. Known as perceptual evaluation of speech quality (PESQ), it is the result of integration of the perceptual analysis measurement system (PAMS) and PSQM99, an enhanced version of PSQM. PESQ is expected to become a new ITU-T recommendation P.862, replacing P.861 which specified PSQM and MNB.
9:30, SPEECH-P4.10
SOURCE-DRIVEN PACKET MARKING FOR SPEECH TRANSMISSION OVER DIFFERENTIATED SERVICES NETWORKS
J. DE MARTIN
We present a source-driven approach to packet marking for
speech transmission over packet networks implementing the
Differentiated Services model. Packets generated by the speech
coder are examined: if deemed perceptually critical, they are marked
as premium and sent on a ``virtual wire,'' otherwise,
they are sent as regular best-effort traffic. Applied to speech
coded with the ITU-T 8 kb/s speech coding standard G.729, the
proposed source-driven packet marking scheme outperforms
source-transparent techniques and provides clearly better
perceptual quality than the unprotected case sending as little
as 1/5 of the coded bitstream as premium traffic. Audio
samples are available at http://demartin.polito.it/icassp2001/.