------------------------------------------------------------- Description of the IRISA Systems - NIST SRE06 ------------------------------------------------------------- Introduction: ------------- This year, IRISA presents two systems, a primary one based on GMM (identified as IRI_1) and a secondary system based on decision trees (identified as IRI_2). These systems output Log-likelihood ratio scores. We submitted results for 3 conditions: 1conv4w-1conv4w 1conv4w-10sec4w 10sec4w-10sec4w Both IRISA systems are based on Gaussian Mixture Models (GMMs). The speaker models are adapted from a Universal Background Model (UBM). Tnorm is applied for scoring. The following parameters apply for the various systems: - Pre-processing: silence removal - Features: * 12 linear frequency cepstral coefficients + 12 derivatives + energy * removal of low-energy frames * cepstral mean subtraction and variance normalization - UBM: Maximum Likelihood training of two gender dependent GMMs using ~500 speakers per gender from the corpora Switchboard II phase 5, NIST04 and NIST05; The two models are concatenated to form a gender independent UBM when the training cond is 1conv4w. * The IRI_1 system is based on a 1024 components GMM with diagonal covariances * The IRI_2 system is based on a 256 components GMM with diagonal covariances - Speaker models: MAP adaptation of the means of the UBM, 1 pass, r=8 - Scoring: 100 Tnorm models per gender are used to normalize the scores in a gender dependent fashion - Decision: gender-dependent thresholds are determined from tests on a development set. All runtime are given for CPU PentiumIV@3.7GHz ---------------------------------------- System IRI_1 (primary): CORE GMM SYSTEM ---------------------------------------- See description above. Runtimes for the training of the models: Training cond 1conv4w: 25 hours Memory: ~100meg Training cond 10sec4w: 2 hours Memory: ~10meg ------------------------------------------------- System IRI_2 (secondary): DECISION TREES SYSTEM ------------------------------------------------- Decision tree for LLR estimation system [1] This system uses decision trees as a fast way to estimate the log-likelihood ratio function of each speaker, based on their GMM models with 256 components. A priori knowledge on the GMM is used for the creation of decision trees - Training: Each decision tree is trained with the CART method applied to the feature frames. A score is reaffected to each node of the resulting tree for the corresponding feature region. (linear regression estimation of the LLR) The trees have an average number of leaves of 340. - Scoring: Each feature frame is scored by a linear combination function according to the leave it falls in. The final score is the mean of each feature frame scores. Gender-dependent decision trees are used. - Decision: gender-dependent thresholds are determined from tests on a development set. Runtime and memory for the training of the trees: Training cond 1conv4w: 50 hours Memory: ~500meg Training cond 10sec4w: 50 hours Memory: ~500meg ------------------------------------------------------------------------- SYSTEMS APPLIED FOR THE VARIOUS TESTS CONDITIONS (Train Cond-Test Cond) ------------------------------------------------------------------------- --------------- 1conv4w-1conv4w: --------------- PARAMETRIZATION: Runtime: 10 hours Memory : ~10 megs SCORING: System IRI_1 (primary): CORE GMM SYSTEM Runtime: 24 hours Memory: ~100 megs System IRI_2: DECISION TREES Scoring: 2 hours Memory: ~0 kB --------------- 1conv4w-10sec4w: --------------- PARAMETRIZATION: Runtime: 1 hour Memory : ~1 meg SCORING: System IRI_1 (primary): CORE GMM SYSTEM Runtime: 5 hours Memory: ~10 megs System IRI_2: DECISION TREES Scoring: 45 minutes Memory: ~0 meg --------------- 10sec4w-10sec4w: --------------- PARAMETRIZATION: Runtime: 1 hour Memory : ~1 meg System IRI_1 (primary): CORE GMM SYSTEM Runtime: 5 hours Memory: ~10 megs System IRI_2: DECISION TREES Scoring: 45 minutes Memory: ~0 meg ---------- References ---------- [1] Gilles Gonon, Frederic Bimbot and Rémi Gribonval: "Decision Trees with Improved Efficiency for Fast Speaker Verification" Proc. EUROSPEECH 05, vol 4, p.2661-2664