N-Best-based unsupervised speaker adaptation for speech recognition
โ Scribed by Tomoko Matsui; Sadaoki Furui
- Publisher
- Elsevier Science
- Year
- 1998
- Tongue
- English
- Weight
- 251 KB
- Volume
- 12
- Category
- Article
- ISSN
- 0885-2308
No coin nor oath required. For personal study only.
โฆ Synopsis
This paper proposes an instantaneous speaker adaptation method that uses N-best decoding for continuous mixture-density hidden-Markovmodel-based speech-recognition systems. This method is effective even for speakers whose decoding using speaker-independent (SI) models are error-prone and for whom speaker adaptation techniques are truly needed. In addition, smoothed estimation and utterance verification are introduced into this method. The smoothed estimation is based on the likelihood values for adapted models of word sequences obtained by N-best decoding and improves the performance of error-prone speakers, and the utterance verification technique reduces the amount of calculation required. Performance evaluation using connected-digit (four-digit strings) recognition experiments performed over actual telephone lines showed a reduction of 36โข4% in the error rates of speakers whose decoding using SI models are error-prone.
๐ SIMILAR VOLUMES
This paper presents an approach of automatic selection of phonetically distributed sentence sets for speaker adaptation, and applies the concept to the task of Mandarin speech recognition with very large vocabulary. This is a different approach to the adaptation data selection problem. A computer al
While n-gram modeling is simple and dominant in speech recognition, it can only capture the short-distance context dependency within an n-word window where currently the largest practical n for natural language is three. However, many of the context dependencies in natural language occur beyond a th