✦ LIBER ✦

On local time–frequency features of speech and their employment in speaker verification

✍ Scribed by Robert M. Nickel; William J. Williams

Publisher: Elsevier Science
Year: 2000
Tongue: English
Weight: 252 KB
Volume: 337
Category: Article
ISSN: 0016-0032
DOI: 10.1016/s0016-0032(00)00032-6

No coin nor oath required. For personal study only.

✦ Synopsis

Commonly used robust speaker veri"cation systems are based on time-varying autoregressive spectral estimation (AR) combined with hidden Markov modeling (HMM) or dynamic time warping (DTW). An exhaustive optimization of these methods in the past has culminated in quite reliable veri"cation schemes. It seems unlikely, though, that further signi"cant improvements are readily obtained along the same path. While short-time AR-modeling focuses on the time-varying spectral envelope of an utterance, we are introducing a new method that focuses on high-resolution estimates of the time-varying spectral structure of individual pitch periods. The new method employs reduced interference time}frequency distributions (RIDs) in combination with a scale and translation invariant pattern recognition technique (STIR). The new method by itself does not deliver better results than commonly used techniques; however, it is shown that an acceptance/rejection decision derived from both AR-DTW and RID}STIR features greatly improves the performance of the veri"cation system.