✦ LIBER ✦

Multiple Speaker Tracking and Detection: Handset Normalization and Duration Scoring

✍ Scribed by Kemal Sönmez; Larry Heck; Mitchel Weintraub

Publisher: Elsevier Science
Year: 2000
Tongue: English
Weight: 128 KB
Volume: 10
Category: Article
ISSN: 1051-2004
DOI: 10.1006/dspr.1999.0368

No coin nor oath required. For personal study only.

✦ Synopsis

We describe SRI's speaker tracking and detection system in the NIST 1998 Speaker Detection and Tracking Development Evaluation. The system is designed for tracking switchboard conversations and uses a twospeaker and silence hidden Markov model (HMM) with a minimum state duration constraint and Gaussian mixture model (GMM) state distributions adapted from a single gender-and handset-independent imposter model distribution. Speaker tracking is used to segment waveforms for speaker detection, which is carried out by averaging frame scores of the Viterbi path and normalizing for handset variation via a novel parameter interpolation extension of HNORM for use with waveform segments of arbitrary lengths. A short-duration penalty to augment the acoustic scores is also introduced via a nonlinear combination function. Results on the NIST 1998 Speaker Detection and Tracking Development Evaluation dataset are reported.

📜 SIMILAR VOLUMES

Local Normalization and Delayed Decision

Local Normalization and Delayed Decision Making in Speaker Detection and Tracking

✍ Johan Koolwaaij; Lou Boves 📂 Article 📅 2000 🏛 Elsevier Science 🌐 English ⚖ 268 KB

This paper describes A2RT's speaker detection and tracking system and its performance on the 1999 NIST speaker recognition evaluation data. The system does not consist of concatenated modules such as, for instance, silence-speech detection, handset and gender detection, and finally speaker detection