✦ LIBER ✦

Large scale discriminative training of hidden Markov models for speech recognition

✍ Scribed by P.C. Woodland; D. Povey

Publisher: Elsevier Science
Year: 2002
Tongue: English
Weight: 197 KB
Volume: 16
Category: Article
ISSN: 0885-2308
DOI: 10.1006/csla.2001.0182

No coin nor oath required. For personal study only.

✦ Synopsis

This paper describes, and evaluates on a large scale, the lattice based framework for discriminative training of large vocabulary speech recognition systems based on Gaussian mixture hidden Markov models (HMMs). This paper concentrates on the maximum mutual information estimation (MMIE) criterion which has been used to train HMM systems for conversational telephone speech transcription using up to 265 hours of training data. These experiments represent the largest-scale application of discriminative training techniques for speech recognition of which the authors are aware. Details are given of the MMIE lattice-based implementation used with the extended Baum-Welch algorithm, which makes training of such large systems computationally feasible. Techniques for improving generalization using acoustic scaling and weakened language models are discussed. The overall technique has allowed the estimation of triphone and quinphone HMM parameters which has led to significant reductions in word error rate for the transcription of conversational telephone speech relative to our best systems trained using maximum likelihood estimation (MLE). This is in contrast to some previous studies, which have concluded that there is little benefit in using discriminative training for the most difficult large vocabulary speech recognition tasks. The lattice MMIE-based discriminative training scheme is also shown to out-perform the frame discrimination technique. Various properties of the lattice-based MMIE training scheme are investigated including comparisons of different lattice processing strategies (full search and exact-match) and the effect of lattice size on performance. Furthermore a scheme based on the linear interpolation of the MMIE and MLE objective functions is shown to reduce the danger of over-training. It is shown that HMMs trained with MMIE benefit as much as MLE-trained HMMs from applying model adaptation using maximum likelihood linear regression (MLLR). This has allowed the straightforward integration of MMIE-trained HMMs into complex multi-pass systems for transcription of conversational telephone speech and has contributed to our MMIE-trained systems giving the lowest word error rates in both the 2000 and 2001 NIST Hub5 evaluations.

📜 SIMILAR VOLUMES

Hidden Markov model training with contam

Hidden Markov model training with contaminated speech material for distant-talking speech recognition

✍ Matassoni, Marco (author);Omologo, Maurizio (author);Giuliani, Diego (author);Sv 📂 Article 📅 2002 🏛 Academic Press 🌐 English ⚖ 280 KB

A challenging scenario is addressed in which a distant-talking speech recognizer operates in a noisy office environment with model adaptation. The use of a single far microphone as well as that of a microphone array input are investigated. In addition to the benefits from the application of microph

Hidden Markov model training with contam

Hidden Markov model training with contaminated speech material for distant-talking speech recognition

✍ Matassoni, Marco (author);Omologo, Maurizio (author);Giuliani, Diego (author);Sv 📂 Article 📅 2002 🏛 Academic Press 🌐 English ⚖ 280 KB

Multiple VQ hidden Markov modelling for

Multiple VQ hidden Markov modelling for speech recognition

✍ J.C. Segura; A.J. Rubio; A.M. Peinado; P. García; R. Román 📂 Article 📅 1994 🏛 Elsevier Science 🌐 English ⚖ 527 KB

Hidden Markov model representation of qu

Hidden Markov model representation of quantized articulatory features for speech recognition

✍ Kevin Erler; Li Deng 📂 Article 📅 1993 🏛 Elsevier Science 🌐 English ⚖ 707 KB

This paper describes a speech recognizer based on an HMM representation of quantized articulatory features and presents experimental results for its evaluation. Traditional schemes for HMM representation of speech have attempted to model a set of disjoint time segments. In order to create a more rob

Contextual vector quantization for speec

Contextual vector quantization for speech recognition with discrete hidden Markov model

✍ Qiang Huo; Chorkin Chan 📂 Article 📅 1995 🏛 Elsevier Science 🌐 English ⚖ 541 KB

Speaker independent recognition of Itali

Speaker independent recognition of Italian telephone speech with mixture density hidden Markov models

✍ Roberto Pieraccini 📂 Article 📅 1991 🏛 Elsevier Science 🌐 English ⚖ 802 KB