All the systems are based on the ALIZE toolkit, except the acoustic parameterization which is performed
using the SPRO software. 


Description of the UPMC_PRC_BaseLine system
--------------------------------
The UPMC_PRC_BaseLine system is a GMM baseline system.

1. Parameterization
1.1 Front end processing:
The signal is characterized by 32 coefficients including 16 linear
frequency cepstral coefficients (LFCC) and their first derivative
coefficients, obtained as follows. 24 filter bank coefficients are
first computed over 20ms Hamming windowed frames at a 10ms frame
rate. Bandwidth is limited to the 300-3400Hz range. Filter bank
coefficients are then converted to 16th order cepstral
coefficients using a Discrete Cosine transformation.

1.2 Frame removal
Here, the energy coefficients are first normalized using a mean removal and variance normalization
in order to fit a 0-mean and 1-variance distribution. They are then used to train a three component
GMM, which aims at selecting informative frames. Indeed, X% of the most energized frames are selected
through the GMM, with:

X=w_1+(merged*alpha*w_2) where w_1 is the weight of the highest (energy) Gaussian component,
w_2 is the weight of the middle component, "merged" is an integer ranging from 0 to 1,
and "alpha" is a weighting parameter. 
The value of "merged" is decided according to the likelihood loss when merging 
the gaussian components 1 and 2 and the components 2 and 3. If the loss is higher for components 1 and 2,
"merged" is set to 0 else to 1. 
"alpha" was empirically fixed to 0.0.

Once the speech segments of a signal are selected, a final process is applied in order to refine the speech segmentation:
- overlapped speech segments between both the sides of a conversation are removed 
- morphological rules are applied on speech segments to avoid too short ones, adding or removing some speech frames.


1.3 Parameter normalization
Finally, the parameter vectors are normalized to
fit a 0-mean and 1-variance distribution. The mean
and variance estimators used for the normalization are computed
file by file on all the frames kept after applying the frame
removal processing.


2. Model Training
All the models are based on GMM scheme.

2.1 World models
The world modeling relies on three steps: initialization, training and warping. 
Resulting world models are 512 gender dependent Gaussian Mixture Models with
diagonal covariance matrices. 

Initialization:
The initialization step consists in a separation of the acoustic space in n classes.
This is made via a random selection of frames. Precisely, for a better separation of initial classes,
frames are selected among the entire learning signal via a probability followed by an iteration of the EM algorithm,
to estimate the GMM parameters. The system is tuned so that each class is initialized with about 1000 frames.

Training:
The estimation of the world model parameters, second step of the process, is done thanks to the EM algorithm.
For this task, instead of using all the learning signals in their temporal order,
we use a probability to select frames randomly.

For each world model, 25 iterations of EM are computed as follows:
During the first 21 iterations, only 10% of the overall frame number is involved in the EM algorithm;
The 10% of frames is selected randomly at each new iteration;
During the last 4 iterations, the entire signal is classically used in its temporal order.

During all the process, a variance flooring is applied so that no variance value is less than 0.5.

Warping:
The overall training process, described above, does not respect the normalization hypothesis
(induced by the parameterization step), i.e. a global 0-mean and 1-variance distribution.
The variance flooring may be a first reason to the mismatch between the obtained distribution and the expected one.
To cancel this mismatch, a model warping technique is finally applied on the world model.

2.2 Client model
Client models are derived by a Bayesian
Adaptation (1 iteration of the Maximum A Posteriori method) of the world model.
Only means of each gaussian are adapted. The amount of adaptation for each mean is related
to the amount of data available, this is achieved via a relevance factor of 14.

Due to this speaker modeling process, the normalization hypothesis is not still respected in the resulting speaker GMM.
The mismatch between the obtained distribution and the expected one (0-mean and 1-variance distribution)
is mainly caused by the lack of adaptation data. To cancel this mismatch, all the speaker models are normalized,
at each iteration, by applying the warping technique. Only the means of GMM models are warped here.

3. Test, normalization, and decision Speaker detection test relies on log-likelihood ratio,
computed on the 10 best gaussian components.
Classical Tnorm normalization technique is then applied on each test likelihood ratio.
Finally, the Tnormed log likelihood scores are compared with a threshold to make the decision.
This threshold is gender independent and set on the best DCF point estimated on SRE'05 development set. 
The threshold for the trials is fixed to 2.47


4. Corpora used
The corpora we used for the world training, and for the normalization are the ones covered by the 
2006 Nist speaker recognition evaluation license agreement:

* NIST'04 corpus
* NIST'05 corpus
  
The gender dependent world models are trained using 6h and 8h of speech for male and female models respectively.

Gender dependent TNorm populations are extracted from the NIST'04 corpus: 76 speakers for the male population 
and 113 for the female populations. 


5. Computational time
All the computations was done in a single 3GHz processor.
  
Gender-dependent world model:	23h00
Gender-dependent TNorm model:	4h

Client model train (1conv4w):	22h00 
Trials (1conv4w): 34h00


Tests TNorm (1conv4w):  53h


************************************************************************************************************************
Description of the UPMC_PRC_EvolFilerBank system
************************************************************************************************************************

The difference between the UPMC_PRC_BaseLine system and the UPMC_PRC_EvolFilerBank consist in using 
a different parameterization method.

1. Parameterization
1.1 Front end processing:
The signal is characterized by 32 coefficients including 16 frequency cepstral coefficients (LFCC) 
and their first derivative coefficients, obtained as follows:
24 filter bank coefficients are first computed over 20ms Hamming windowed frames at a 10ms frame rate. 
Bandwidth is limited to the 300-3400Hz range. 

The particularity of this feature extractor is the fact that 
the filter bank used was design in order to produces features decorrelated (as possible) 
to the features obtain with a linear scaled filter bank. 
The following table describes respectively the center frequencies and the band widths 
of each triangular filter in the bank. This filter bank was obtained by a data driven algorithm.

Center		Band width (Hz) 
frequency (Hz)

 392.2			188.2353
 439.2			  31.3726
 502.0			219.6078
 596.1			156.8628
 752.9			470.5883
 941.2			282.3529
1051.0			188.2353
1129.4			156.8627
1239.2			282.3529
1364.7			250.9804
1490.2			282.3529
1600.0			156.8628
1709.8			282.3529
1851.0			282.3529
1976.5			219.6078
2117.6			345.0980
2258.8			219.6079
2447.1			564.7059
2635.3			188.2353
2760.8			345.0980
2949.0			407.8431
3074.5			125.4902
3152.9			188.2353
3262.7			282.3530


Filter bank coefficients are then converted to 16th order cepstral coefficients using a Discrete Cosine transformation.


All the rest of this system is strictly identical to the UPMC_PRC_BaseLine system,
 except for the threshold whose is fixed to 2.59 


************************************************************************************************************************
Description of the UPMC_PRC_PRIMARY system
************************************************************************************************************************

This system is a fusion of the UPMC_PRC_BaseLine and the UPMC_PRC_EvolFilerBank systems. 
This fusion consists in a weighted sum of the both system's outputs. 
The weights are respectively 0.8 and 0.2 for the UPMC_PRC_BaseLine and the UPMC_PRC_EvolFilerBank. 
They are determined empirically in order to minimize the DCF cost on the Nist'05 evaluation data base. 

For this system, the threshold is fixed to 2.45.