In IOA-1 system(primary sytem), the probability distributions of target is modeled by Gaussian mixture models (GMMs). 
A Universal background model GMM is used as the alternative hypothesis model and target models are derived using Bayesian adaptation. 
The techniques of featuremapping, Tnorm were applied.
A 16-dimensional mel-cepstral vector is extracted from the speech signal every 10ms using a 20ms window. 
Bandlimitng is performed by only retaining the filterbank outputs from the frequency range 100Hz-3800Hz.
Delta and Delta Delta cepstral are then computed over a +-2 frame span and appended to the cepstra vector 
producing a 48 dimensional feature vector. 
The feature vector stream is then processed through an adaptive, energy-based speech detector to discard low-energy vectors. 
The silence removed features are processed with feature mapping and, finally, 
normalized by removing the global mean and dividing by the standard deviation.
The background model used for all targets is a gender independent 2048 mixture trained using data 
from nist 1999 and nist 2001 Target models are derived by Bayesian adaptation (a.k.a. MAP estimation) 
of the UBM parameters using the designated training data. 

Time for creating models: 27.2 hours
Time for processing test segments: 35.5 hours
CPU: intel p4 3.0G; Memory: 1G
In IOA-2 system, the probability distributions of target is modeled by Gaussian mixture models (GMMs). 
A Universal background model GMM is used as the alternative hypothesis model and target models are derived using Bayesian adaptation.
The techniques of feature mapping, Tnorm were applied.
A 16-dimensional mel-cepstral vector is extracted from the speech signal every 10ms using a 20ms window.
Bandlimitng is performed by only retaining the filterbank outputs from the frequency range 120Hz-3600Hz.
Delta and Delta Delta cepstral are then computed over a +-2 frame span and appended to the cepstra vector
producing a 48 dimensional feature vector. The feature vector stream is then processed through an adaptive, 
energy-based speech detector to discard low-energy vectors. 
The silence removed features are processed with feature mapping and, finally, normalized 
by removing the global mean and dividing by the standard deviation.
The background model used for all targets is a gender independent 2048 mixture trained using data 
from nist 1999 and nist 2001 Target models are derived by Bayesian adaptation (a.k.a. MAP estimation) 
of the UBM parameters using the designated training data. 

Time for creating models: 28.1 hours
Time for processing test segments: 36.2 hours
CPU: intel p4 3.0G; Memory: 1G