d-Ear SR System Description 1. Front end processing - 20-ms frame length with 10-ms frame shift - Adaptive SS-based denoising - Energy based End Point Detection - 16-dimension MFCC plus Delta - CMS + CVN 2. UBM training (GMM-UBM based framework) - Two gender-dependent UBMs, each contains 1024 components - Training data comes from NIST SRE'04 set 3. Speaker model training - Adapted from gender-dependent UBM with only mean-modified by MAP 4. Score normalization - T-Norm with selected imposters from NIST SRE'04 set 5. Multi-speaker processing - GLR-based speech segmentation - Differential model scores based speaker clustering ** The confidence scores can be interpreted as likelihood ratios. Hardware Description - Single CPU (Intel Pentium-4 at 3.0 GHz) - Total memory installed: 1.0 GB Execution Time (in "Multiple of Real-Time for the data processed") 1. For creating speaker models - 1conv4w: 1.0 / 32.89 2. For processing test segments - 1conv4w_1conv4w: 1.0 / 45.96 - 1conv4w_10sec4w: 1.0 / 20.55 - 1conv4w_1conv2w: 1.0 / 9.95