𝔖 Bobbio Scriptorium
✦   LIBER   ✦

Data-Driven Temporal Filters and Alternatives to GMM in Speaker Verification

✍ Scribed by Narendranath Malayath; Hynek Hermansky; Sachin Kajarekar; B. Yegnanarayana


Publisher
Elsevier Science
Year
2000
Tongue
English
Weight
259 KB
Volume
10
Category
Article
ISSN
1051-2004

No coin nor oath required. For personal study only.

✦ Synopsis


This paper discusses the research directions pursued jointly at the Anthropic Signal Processing Group of the Oregon Graduate Institute and at the Speech and Vision Laboratory of the Indian Institute of Technology Madras. Current methods for speaker verification are based on modeling the speaker characteristics using Gaussian mixture models (GMM). The performance of these systems significantly degrades if the target speakers use a telephone handset that is different from that used while training. Conventional methods for channel normalization include utterance-based mean subtraction (MS) and RelAtive SpecTrAl (RASTA) filtering. In this paper we introduce a novel method for designing filters that are capable of normalizing the variability introduced by different telephone handsets. The design of the filter is based on the estimated second-order statistics of handset variability. This filter is applied on the logarithmic energy outputs of Mel spaced filter banks. We also demonstrate the effectiveness of the proposed channel normalizing filter in improving speaker verification performance in mismatched conditions. GMM-based systems often use thousands of mixture components and hence require a large number of parameters to characterize each target speaker. In order to address this issue we propose an alternative to GMM for modeling speaker characteristics. The alternative is based on speaker-specific mapping and it relies on a speaker-independent representation of speech.