๐”– Bobbio Scriptorium
โœฆ   LIBER   โœฆ

N-Best-based unsupervised speaker adaptation for speech recognition

โœ Scribed by Tomoko Matsui; Sadaoki Furui


Publisher
Elsevier Science
Year
1998
Tongue
English
Weight
251 KB
Volume
12
Category
Article
ISSN
0885-2308

No coin nor oath required. For personal study only.

โœฆ Synopsis


This paper proposes an instantaneous speaker adaptation method that uses N-best decoding for continuous mixture-density hidden-Markovmodel-based speech-recognition systems. This method is effective even for speakers whose decoding using speaker-independent (SI) models are error-prone and for whom speaker adaptation techniques are truly needed. In addition, smoothed estimation and utterance verification are introduced into this method. The smoothed estimation is based on the likelihood values for adapted models of word sequences obtained by N-best decoding and improves the performance of error-prone speakers, and the utterance verification technique reduces the amount of calculation required. Performance evaluation using connected-digit (four-digit strings) recognition experiments performed over actual telephone lines showed a reduction of 36โ€ข4% in the error rates of speakers whose decoding using SI models are error-prone.


๐Ÿ“œ SIMILAR VOLUMES


Automatic selection of phonetically dist
โœ Jia-lin Shen; Hsin-min Wang; Ren-yuan Lyu; Lin-shan Lee ๐Ÿ“‚ Article ๐Ÿ“… 1999 ๐Ÿ› Elsevier Science ๐ŸŒ English โš– 163 KB

This paper presents an approach of automatic selection of phonetically distributed sentence sets for speaker adaptation, and applies the concept to the task of Mandarin speech recognition with very large vocabulary. This is a different approach to the adaptation data selection problem. A computer al

Interpolation of n-gram and mutual-infor
โœ Z. GuoDong; L. KimTeng ๐Ÿ“‚ Article ๐Ÿ“… 1999 ๐Ÿ› Elsevier Science ๐ŸŒ English โš– 191 KB

While n-gram modeling is simple and dominant in speech recognition, it can only capture the short-distance context dependency within an n-word window where currently the largest practical n for natural language is three. However, many of the context dependencies in natural language occur beyond a th