Hidden Markov model training with contaminated speech material for distant-talking speech recognition
โ Scribed by Matassoni, Marco (author);Omologo, Maurizio (author);Giuliani, Diego (author);Svaizer, Piergiorgio (author)
- Publisher
- Academic Press
- Year
- 2002
- Tongue
- English
- Weight
- 280 KB
- Volume
- 16
- Category
- Article
- ISSN
- 0885-2308
No coin nor oath required. For personal study only.
โฆ Synopsis
A challenging scenario is addressed in which a distant-talking speech recognizer operates in a noisy office environment with model adaptation. The use of a single far microphone as well as that of a microphone array input are investigated.
In addition to the benefits from the application of microphone array processing, system robustness is improved by training hidden Markov models (HMMs) with a contaminated version of a clean corpus. This artificial corpus is produced by exploiting information extracted from "real world" acoustic scenarios. The resulting models are then used as a starting point for unsupervised incremental adaptation.
Experimental results show that improvements in recognition accuracy due to multiple microphones, HMM training on contaminated speech and incremental adaptation are additive on a connected digits task. Moreover, the results show that unsupervised incremental adaptation receives the benefits of starting from models trained using contaminated speech. A final contribution of the paper refers to the influence of accuracy of speech activity detection, which seems to be relevant when moving towards real applications.
๐ SIMILAR VOLUMES
This paper describes, and evaluates on a large scale, the lattice based framework for discriminative training of large vocabulary speech recognition systems based on Gaussian mixture hidden Markov models (HMMs). This paper concentrates on the maximum mutual information estimation (MMIE) criterion wh
A discrete wavelet transform algorithm segregates the operand data set sequentially. It generates computational intermediates which represent it at graded resolutions and leads to a reciprocal domain within which information is multiply resolved in terms of the timefrequency localization of the comp
This paper describes a speech recognizer based on an HMM representation of quantized articulatory features and presents experimental results for its evaluation. Traditional schemes for HMM representation of speech have attempted to model a set of disjoint time segments. In order to create a more rob