Home
 Mirror Sites
 General Information
 Confernce Schedule
 Technical Program
 Tutorials
 Industry Technology Tracks
 Exhibits
 Sponsors
 Registration
 Coming to Phoenix
 Call for Papers
 Author's Kit
 On-line Review
 Future Conferences
 Help
|
Abstract: Session SP-10 |
|
SP-10.1
|
SPEAKER VERIFICATION PERFORMANCE AND THE LENGTH OF TEST SENTENCE
Jialong He,
Li Liu (Dept. Speech & Hearing Science, Arizona State University)
It is known that the performance of a speaker verification system improves with the length of test sentences. However, little is known about the exact relation between the performance and the test length. That makes it difficult to compare the results from various studies in which different test lengths have been used to evaluate the systems. In this paper, we have proposed a method to calculate the verification error rates at any lengths of test sentences, as long as the error rates at two different lengths are given. The accuracy of this calculation method is demonstrated with a speaker verification experiment and with the results reported in literature. Good agreement is shown between the calculated values and that measured through experiments.
|
SP-10.2
|
ON THE USE OF SOME DIVERGENCE MEASURES IN SPEAKER RECOGNITION
Rivarol Vergin (Universite de Moncton),
Douglas O'Shaughnessy (INRS-Telecommunications)
The first motivation for using Gaussian Mixture Models
for text-independent speaker identification is based
on the observation that a linear combination of
Gaussian basis functions is capable of representing
a large class of sample distributions. While this
technique gives generally good results little is known
about which specific part of a speech signal bests
identifies a speaker. This contribution suggests a
procedure, based on Jensen divergence measure, to
automatically extract from the input speech signal the
part that best contributes to identify a speaker.
Experiments conducted using the Spidre database indicate
a significant improvement in the performance of the
speaker recognition system.
|
SP-10.3
|
Improving a GMM Speaker Verification System by Phonetic Weighting
Roland Auckenthaler,
Eluned S Parris,
Michael J Carey (Ensigma Ltd)
This paper compares two approaches to speaker verification, Gaussian mixture models (GMMs) and Hidden Markov models (HMMs). The GMM based system outperformed the HMM system, this was mainly due to the ability of the GMM to make better use of the training data. The best scoring GMM frames were strongly correlated with particular phonemes e.g. vowels and nasals. Two techniques were used to try and exploit the different amounts of discrimination provided by the phonemes to improve the performance of the GMM based system. Applying linear weighting to the phonemes showed that less than half of the phonemes were contributing to the overall system performance. Using an MLP to weight the phonemes provided a significant improvement in performance for male speakers but no improvement has yet been achieved for women.
|
SP-10.4
|
A HYBRID SCORE MEASUREMENT FOR HMM-BASED SPEAKER VERIFICATION
Yong Gu,
Trevor Thomas (Vocalis Ltd., UK)
In speaker verification the world model based approach and the cohort model based approach have been used for better HMM score measurements for verification comparison. From theoretical analysis these two approaches represent two different paradigms for verification decision-making strategy. Two techniques could be combined for a better solution. In the paper we present a hybrid score measurement which combines the world model based technique and the cohort model based technique together. The method is evaluated with the YOHO database. The results show that the combination can lead a better score measurement which improves speaker verification performance. An experimental comparison between the world model based approach and the cohort model based approach with the YOHO database can also be found in the paper.
|
SP-10.5
|
Polynomial Classifier Techniques for Speaker Verification
William M Campbell (Motorola SSG),
Khaled T Assaleh (Rockwell Semiconductor Systems)
Modern speaker verification applications require high accuracy at low
complexity. We propose the use of a polynomial-based classifier to
achieve this objective. We demonstrate a new combination of
techniques which makes polynomial classification accurate
and powerful for speaker verification. We show that discriminative
training of polynomial classifiers can be performed on large data
sets. A prior probability compensation method is detailed which
increases accuracy and normalizes the output score range. Results are
given for the application of the new methods to YOHO.
|
SP-10.6
|
Channel-Robust Speaker Identification using Modified-Mean Cepstral Mean Normalization with Frequency Warping
Alvin A Garcia (SpeakEZ/T-NETIX, Inc.),
Richard J Mammone (CAIP Center, Rutgers University)
The performance of automatic speaker recognition systems is significantly
degraded by acoustic mismatches between training and testing conditions.
Such acoustic mismatches are commonly encountered in systems that operate on
speech collected over telephone networks, where different handsets and
different network routes impose varying convolutional distortions on the
speech signal.
A new algorithm, the Modified-Mean Cepstral Mean Normalization with
Frequency Warping (MMCMNFW) method, which improves upon the commonly-employed
Cepstral Mean Subtraction method, has been developed. Experimental results on
closed-set speaker identification tasks on a channel-corrupted subset of the
TIMIT database and on a subset of the NTIMIT database are presented.
The new algorithm is shown to offer improved recognition rates over other
existing channel normalization methods on these databases.
|
SP-10.7
|
Feature Selection Using Genetics-Based Algorithm and Its Application to Speaker Identification
Mubeccel Demirekler (Electrical & Electronics Eng. Dept., Middle East Technical University),
Ali Haydar (Electrical & Electronics Eng. Dept., Eastern Mediterranean University)
This paper introduces the use of genetics-based algorithm in the reduction of 24 parameter set (i.e. the base set) to a 5, 6, 7, 8 or 10 parameter set, for each speaker in text-independent speaker identification. The feature selection is done by finding the best features that discriminates a person from his/her two closest neighbors. The experimental results show that there is approximately 5% increase in the recognition rate when the reduced set of parameters are used. Also the amount of calculation necessary for speaker recognition using the reduced set of features is much less than the amount of calculation required using the complete feature set in the testing phase. Hence it is more desirable to use the subset of the complete feature set found using the genetic algorithm suggested.
|
|