✦ LIBER ✦

Continuous hidden Markov models integrating transitional and instantaneous features for Mandarin syllable recognition

✍ Scribed by Yumin Lee; Lin-shan Lee

Publisher: Elsevier Science
Year: 1993
Tongue: English
Weight: 653 KB
Volume: 7
Category: Article
ISSN: 0885-2308
DOI: 10.1006/csla.1993.1013

No coin nor oath required. For personal study only.

✦ Synopsis

Feature parameters describing spectral transitions of speech signals have been properly integrated with the instantaneous features in many different approaches proposed for speech recognition, and significant performance improvements have been attained. Most of these methods are designed for recognition systems based on dynamic time warping (DTW) or discrete hidden Markov models (HMM). However, it has been experimentally shown that for the difficult problem of recognizing the highly confusing Mandarin syllables with limited amount of training data, the performances of DTW and discrete HMM techniques are much worse than that of continuous HMMs. In this paper, the performance of continuous HMMs using one type of transitional features in speaker-dependent recognition of the highly confusing Mandarin syllables is first evaluated and discussed in detail under the constraint of very limited training data. Three approaches are then proposed to integrate the instantaneous and transitional features for recognition systems based on continuous hidden Markov models. They are the most straightforward concatenation-integration approach in which the instantaneous and transitional feature vectors are simply concatenated, the two-maximization approach in which the output distribution functions for the instantaneous and transitional feature vectors are maximized separately, and the (t w o)-model approach in which two HMMs respectively for instantaneous and transitional feature vectors are independently trained but the log likelihoods are summed up with proper weighting. After extensive experiments and careful analysis, it is found that the three approaches respectively provide attractive performance under different conditions. For example, with the two-maximization approach a recognition rate ((93.89 %)) only slightly lower than the highest achievable rate for the concatenation-integration approach ((94 \cdot 36 %) for (M=5) ) can be obtained at a much smaller number of mixtures ((M=2)).