SWANSEA NIST 2006 EVALUATION OVERVIEW

Speaker Verification System Description
=======================================
UWS speaker verification is based on an LFCC front-end and a GMM system
for speaker adaptation and testing. The following systems are processed and
submitted to NIST:

10sec4w-10sec4w.ndx
 UWS1: weighted linear fusion between [SMM SET1 bilateral scoring] [ALZ SET2] [ALZ SET1] 
 UWS2: [SMM SET1 bilateral scoring]

1conv4w-10sec4w.ndx
 UWS1: [ALZ SET3]
 UWS2: [ALZ SET3 no T-norm]
 UWS3: [SMM SET3] 
1conv4w-1conv4w.ndx
 UWS1: [ALZ SET3]
 UWS2: [ALZ SET3 no T-norm]
 UWS3: [SMM SET3 bilateral scoring] 
1conv4w-1conv2w.ndx
 UWS1: [SMM SET3]
1conv4w-1convmic.ndx
 UWS1: [SMM SET3]
 
3conv4w-10sec4w.ndx
 UWS1: [ALZ SET3]
 UWS2: [ALZ SET3 no T-norm]
 UWS3: [SMM SET3] 
3conv4w-1conv4w.ndx
 UWS1: [ALZ SET3]
 UWS2: [ALZ SET3 no T-norm]
3conv4w-1conv2w.ndx
 UWS1: [SMM SET3]
3conv4w-1convmic.ndx
 UWS1: [SMM SET3]

8conv4w-10sec4w.ndx
 UWS1: [ALZ SET3]
 UWS2: [ALZ SET3 no T-norm]
 UWS3: [SMM SET3] 
8conv4w-1convmic.ndx
 UWS1: [ALZ SET3]
 UWS2: [ALZ SET3 no T-norm]
8conv4w-1conv4w.ndx
 UWS1: [SMM SET3]
8conv4w-1conv2w.ndx
 UWS1: [SMM SET3]

3conv2w-1conv2w.ndx
 UWS1: [SMM SET3]
3conv2w-1conv4w.ndx
 UWS1: [SMM SET3]

Front-End
3 sets of features are used, all generated with SPRO4/LIASpkd tools

   FFT:
	Frame Size: 20 ms
	Frame Rate: 10 ms
   Filter:
	SET1: no bandwidth filtering
	SET2,3: bandwidth limited to 300-3400Hz:
      
   Speech Detection (cf: [1]):
	SET1: Mean/Std bigaussian method with no pre-normalisation (alph=0.25)
	SET2,3: Trigaussian alpha=0 on 0-mean 1-var normalised energy componant 

   Normalisation:
	All Sets: 0-mean 1-variance

   Feature Vector Size: 
	SET1: 34: 16 LFCC + 16 delta + Energy + delta Energy
	SET2: 30  16 LFCC + 8 delta + 5 1st double delta + delta Energy
	SET3: 50  19 LFCC + 19 delta + 11 1st double delta + delta Energy

   The vector size for SET3 (higher dimension double deltas removed) has been  optimized using the 1conv4w-1conv4w NIST05 trials.
   The vector size for SET2 has been  optimized using the 10sec4w-10s4w NIST05 trials.
   

All data used for UBM and T-norm cohort are fron NIST 2004 

GMM System [SMM]:

Bilateral System

   Background Model
      GMM with 512 components
      Trained on training sets of Nist 2004 
      
   Speaker Adaptation
      Mean Only Adaptation

   Testing
      Scoring of best 5 mixture components for each speaker model

   Normalisation
      T-Norm with 200 gender specific condition matched speakers taken from NIST 2004 database.

GMM System [ALZ]: based on Alize/LIA_spkDet

   GMM with 2048 component
   UBM gender specific ~200 files from 1con4w (NIST04)
   Normalisation:
      T-Norm with ~100 gender specific from 1conv4w 
   - for the 10sec10sec4w condition only, 512 GMM components have been used, T-Norm cohort from 10sec4w [NIST2004]

Execution time:
   Refer to LIA system description for details about [ALZ] system
   [SMM] has similar execution times.


In deriving the submitted scores for the Nist2006
speaker recocognition evaluation UWS did not make use
of data from the corresponding Nist2005 evaluation

In previous years the UWS submission has been almost
wholly based on Roland Auckenthaler's SVT commercial
system, the only exception being for the 10sec-10sec
condition. The 2006 UWS submission is not linked in any
way to SVT and hence the only like-for-like, year-on-year
comparison would be on the 10s-10s condition.


[1]: J.-F. Bonastre, N. Scheffer, C. Fredouille, D. Matrouf, NIST'04 speaker recognition evaluation campaign: new LIA speaker detection plateform based on ALIZE toolkit, 2004
NIST SRE'04 Workshop: speaker detection evaluation campaign, June 2004. Toledo, Spain.