Improvement of noisy speech recognition using a proportional alignment decoding algorithm in the training phase
✍ Scribed by Wei-Wen Hung; Hsiao-Chuan Wang
- Publisher
- Elsevier Science
- Year
- 1998
- Tongue
- English
- Weight
- 670 KB
- Volume
- 12
- Category
- Article
- ISSN
- 0885-2308
No coin nor oath required. For personal study only.
✦ Synopsis
Modelling the state duration of hidden Markov models (HMMs) can effectively improve the accuracy in decoding the state sequence of an utterance and result in an improvement of speech recognition accuracy. However, when a speech signal is contaminated by ambient noise, the decoded state sequence may be distorted. It may stay at some states too long or too short even with the help of state duration models. This paper presents a proportional alignment decoding (PAD) algorithm for retraining the HMMs. A task of multi-speaker isolated Mandarin digit recognition was conducted to demonstrate the effectiveness and robustness of the PAD-based variable duration hidden Markov model (VDHMM/PAD) method. Experimental results show that the discriminativity of VDHMM/PAD in a noisy environment has been significantly enhanced. Moreover, the proposed method outperforms those widely used state duration modelling methods, such as using Poisson, gamma, Gaussian, bounded and nonparametric probability density functions.