Hidden Markov model representation of quantized articulatory features for speech recognition
β Scribed by Kevin Erler; Li Deng
- Publisher
- Elsevier Science
- Year
- 1993
- Tongue
- English
- Weight
- 707 KB
- Volume
- 7
- Category
- Article
- ISSN
- 0885-2308
No coin nor oath required. For personal study only.
β¦ Synopsis
This paper describes a speech recognizer based on an HMM representation of quantized articulatory features and presents experimental results for its evaluation. Traditional schemes for HMM representation of speech have attempted to model a set of disjoint time segments. In order to create a more robust speech recognition system, the speech production system is characterized by a set of articulatory features, each of which are allowed to vary over a range of discrete values. Each configuration of the articulatory system is characterized by a particular combination of feature values. "Target configurations" of the articulatory system are those configurations which produce the distinctive homogeneous segments in the acoustic signal. These feature values are permitted to vary independently and asynchronously (with appropriate constraints) as the production system moves from one target configuration to the next (such intermediate feature combinations are referred to as "transitional configurations"). This avoids the abrupt model changes inherent in non-overlapping segment modeling. The feature value combinations that occur while in transit between target configurations represent the coarticulation intervals between the two targets. This scheme is implemented using an ergodic HMM to control the evolution of the feature values as the system moves from one target configuration to the next. Speech recognition results show that the new system outperforms the traditional HMM approaches in small tasks. Examination of the source of error, using Viterbi analysis, in both the new model and in traditional HMM recognition schemes suggests that this new scheme is able to achieve better modelling of the acoustic transitions and coarticulation in speech.
π SIMILAR VOLUMES
This paper describes, and evaluates on a large scale, the lattice based framework for discriminative training of large vocabulary speech recognition systems based on Gaussian mixture hidden Markov models (HMMs). This paper concentrates on the maximum mutual information estimation (MMIE) criterion wh
A challenging scenario is addressed in which a distant-talking speech recognizer operates in a noisy office environment with model adaptation. The use of a single far microphone as well as that of a microphone array input are investigated. In addition to the benefits from the application of microph