Authors:
Jialong He,
Li Liu,
Page (NA) Paper number 1021
Abstract:
It is known that the performance of a speaker verification system improves
with the length of test sentences. However, little is known about the
exact relation between the performance and the test length. That makes
it difficult to compare the results from various studies in which different
test lengths have been used to evaluate the systems. In this paper,
we have proposed a method to calculate the verification error rates
at any lengths of test sentences, as long as the error rates at two
different lengths are given. The accuracy of this calculation method
is demonstrated with a speaker verification experiment and with the
results reported in literature. Good agreement is shown between the
calculated values and that measured through experiments.
Authors:
Rivarol Vergin,
Douglas O'Shaughnessy,
Page (NA) Paper number 1336
Abstract:
The first motivation for using Gaussian Mixture Models for text-independent
speaker identification is based on the observation that a linear combination
of Gaussian basis functions is capable of representing a large class
of sample distributions. While this technique gives generally good
results little is known about which specific part of a speech signal
bests identifies a speaker. This contribution suggests a procedure,
based on Jensen divergence measure, to automatically extract from the
input speech signal the part that best contributes to identify a speaker.
Experiments conducted using the Spidre database indicate a significant
improvement in the performance of the speaker recognition system.
Authors:
Roland Auckenthaler,
Eluned S Parris,
Michael J Carey,
Page (NA) Paper number 1440
Abstract:
This paper compares two approaches to speaker verification, Gaussian
mixture models (GMMs) and Hidden Markov models (HMMs). The GMM based
system outperformed the HMM system, this was mainly due to the ability
of the GMM to make better use of the training data. The best scoring
GMM frames were strongly correlated with particular phonemes e.g. vowels
and nasals. Two techniques were used to try and exploit the different
amounts of discrimination provided by the phonemes to improve the performance
of the GMM based system. Applying linear weighting to the phonemes
showed that less than half of the phonemes were contributing to the
overall system performance. Using an MLP to weight the phonemes provided
a significant improvement in performance for male speakers but no improvement
has yet been achieved for women.
Authors:
Yong Gu, Vocalis Ltd., UK (U.K.)
Trevor Thomas, Vocalis Ltd., UK (U.K.)
Page (NA) Paper number 1636
Abstract:
In speaker verification the world model based approach and the cohort
model based approach have been used for better HMM score measurements
for verification comparison. From theoretical analysis these two approaches
represent two different paradigms for verification decision-making
strategy. Two techniques could be combined for a better solution. In
the paper we present a hybrid score measurement which combines the
world model based technique and the cohort model based technique together.
The method is evaluated with the YOHO database. The results show that
the combination can lead a better score measurement which improves
speaker verification performance. An experimental comparison between
the world model based approach and the cohort model based approach
with the YOHO database can also be found in the paper.
Authors:
William M Campbell,
Khaled T Assaleh,
Page (NA) Paper number 1735
Abstract:
Modern speaker verification applications require high accuracy at low
complexity. We propose the use of a polynomial-based classifier to
achieve this objective. We demonstrate a new combination of techniques
which makes polynomial classification accurate and powerful for speaker
verification. We show that discriminative training of polynomial classifiers
can be performed on large data sets. A prior probability compensation
method is detailed which increases accuracy and normalizes the output
score range. Results are given for the application of the new methods
to YOHO.
Authors:
Alvin A Garcia,
Richard J Mammone,
Page (NA) Paper number 2165
Abstract:
The performance of automatic speaker recognition systems is significantly
degraded by acoustic mismatches between training and testing conditions.
Such acoustic mismatches are commonly encountered in systems that operate
on speech collected over telephone networks, where different handsets
and different network routes impose varying convolutional distortions
on the speech signal. A new algorithm, the Modified-Mean Cepstral Mean
Normalization with Frequency Warping (MMCMNFW) method, which improves
upon the commonly-employed Cepstral Mean Subtraction method, has been
developed. Experimental results on closed-set speaker identification
tasks on a channel-corrupted subset of the TIMIT database and on a
subset of the NTIMIT database are presented. The new algorithm is shown
to offer improved recognition rates over other existing channel normalization
methods on these databases.
Authors:
Mubeccel Demirekler,
Ali Haydar,
Page (NA) Paper number 5026
Abstract:
This paper introduces the use of genetics-based algorithm in the reduction
of 24 parameter set (i.e. the base set) to a 5, 6, 7, 8 or 10 parameter
set, for each speaker in text-independent speaker identification. The
feature selection is done by finding the best features that discriminates
a person from his/her two closest neighbors. The experimental results
show that there is approximately 5% increase in the recognition rate
when the reduced set of parameters are used. Also the amount of calculation
necessary for speaker recognition using the reduced set of features
is much less than the amount of calculation required using the complete
feature set in the testing phase. Hence it is more desirable to use
the subset of the complete feature set found using the genetic algorithm
suggested.
|