SWANSEA NIST 2006 EVALUATION OVERVIEW Speaker Verification System Description ======================================= UWS speaker verification is based on an LFCC front-end and a GMM system for speaker adaptation and testing. The following systems are processed and submitted to NIST: 10sec4w-10sec4w.ndx UWS1: weighted linear fusion between [SMM SET1 bilateral scoring] [ALZ SET2] [ALZ SET1] UWS2: [SMM SET1 bilateral scoring] 1conv4w-10sec4w.ndx UWS1: [ALZ SET3] UWS2: [ALZ SET3 no T-norm] UWS3: [SMM SET3] 1conv4w-1conv4w.ndx UWS1: [ALZ SET3] UWS2: [ALZ SET3 no T-norm] UWS3: [SMM SET3 bilateral scoring] 1conv4w-1conv2w.ndx UWS1: [SMM SET3] 1conv4w-1convmic.ndx UWS1: [SMM SET3] 3conv4w-10sec4w.ndx UWS1: [ALZ SET3] UWS2: [ALZ SET3 no T-norm] UWS3: [SMM SET3] 3conv4w-1conv4w.ndx UWS1: [ALZ SET3] UWS2: [ALZ SET3 no T-norm] 3conv4w-1conv2w.ndx UWS1: [SMM SET3] 3conv4w-1convmic.ndx UWS1: [SMM SET3] 8conv4w-10sec4w.ndx UWS1: [ALZ SET3] UWS2: [ALZ SET3 no T-norm] UWS3: [SMM SET3] 8conv4w-1convmic.ndx UWS1: [ALZ SET3] UWS2: [ALZ SET3 no T-norm] 8conv4w-1conv4w.ndx UWS1: [SMM SET3] 8conv4w-1conv2w.ndx UWS1: [SMM SET3] 3conv2w-1conv2w.ndx UWS1: [SMM SET3] 3conv2w-1conv4w.ndx UWS1: [SMM SET3] Front-End 3 sets of features are used, all generated with SPRO4/LIASpkd tools FFT: Frame Size: 20 ms Frame Rate: 10 ms Filter: SET1: no bandwidth filtering SET2,3: bandwidth limited to 300-3400Hz: Speech Detection (cf: [1]): SET1: Mean/Std bigaussian method with no pre-normalisation (alph=0.25) SET2,3: Trigaussian alpha=0 on 0-mean 1-var normalised energy componant Normalisation: All Sets: 0-mean 1-variance Feature Vector Size: SET1: 34: 16 LFCC + 16 delta + Energy + delta Energy SET2: 30 16 LFCC + 8 delta + 5 1st double delta + delta Energy SET3: 50 19 LFCC + 19 delta + 11 1st double delta + delta Energy The vector size for SET3 (higher dimension double deltas removed) has been optimized using the 1conv4w-1conv4w NIST05 trials. The vector size for SET2 has been optimized using the 10sec4w-10s4w NIST05 trials. All data used for UBM and T-norm cohort are fron NIST 2004 GMM System [SMM]: Bilateral System Background Model GMM with 512 components Trained on training sets of Nist 2004 Speaker Adaptation Mean Only Adaptation Testing Scoring of best 5 mixture components for each speaker model Normalisation T-Norm with 200 gender specific condition matched speakers taken from NIST 2004 database. GMM System [ALZ]: based on Alize/LIA_spkDet GMM with 2048 component UBM gender specific ~200 files from 1con4w (NIST04) Normalisation: T-Norm with ~100 gender specific from 1conv4w - for the 10sec10sec4w condition only, 512 GMM components have been used, T-Norm cohort from 10sec4w [NIST2004] Execution time: Refer to LIA system description for details about [ALZ] system [SMM] has similar execution times. In deriving the submitted scores for the Nist2006 speaker recocognition evaluation UWS did not make use of data from the corresponding Nist2005 evaluation In previous years the UWS submission has been almost wholly based on Roland Auckenthaler's SVT commercial system, the only exception being for the 10sec-10sec condition. The 2006 UWS submission is not linked in any way to SVT and hence the only like-for-like, year-on-year comparison would be on the 10s-10s condition. [1]: J.-F. Bonastre, N. Scheffer, C. Fredouille, D. Matrouf, NIST'04 speaker recognition evaluation campaign: new LIA speaker detection plateform based on ALIZE toolkit, 2004 NIST SRE'04 Workshop: speaker detection evaluation campaign, June 2004. Toledo, Spain.