A speaker-independent word recognition based on HMM using orthogonalized phonetic segment codebook
✍ Scribed by Hiroshi Matsuura; Tsuneo Nitta
- Publisher
- John Wiley and Sons
- Year
- 1994
- Tongue
- English
- Weight
- 833 KB
- Volume
- 25
- Category
- Article
- ISSN
- 0882-1666
No coin nor oath required. For personal study only.
✦ Synopsis
Abstract
Matrix quantization (MQ) is a method which directly quantizes the spectrum‐time pattern. However, it has a problem in that the quantization error is relatively large compared to the vector quantization (VQ), since the dimension is large and the pattern variation is less.
From such a viewpoint, this paper introduces the acoustic/phonetic structure called phonetic segment as the unit of MQ. The statistical matrix quantization (SMQ) is applied to the calculation of the error measure, where the subspace method, i.e., a statistical pattern recognition technique, is employed. The purpose of SMQ is to consider effectively the pattern variation by constructing the Orthogonalized phonetic segment codebook based on the eigenvector set representing the pattern variation for each phonetic segment. The training of HMM using the phonetic segment code sequence also is considered.
The K‐best learning is proposed, where from the first to the K‐th phonetic segment sequences are equally handled. Even though the K‐best learning is much simpler than VQ, it has equal or better output probability smoothing power, and can suppress the effect of the error in the conversion of the speech to the phonetic segment code sequence.
Using the (SMQ/HMM + K‐best learning) method, a high speaker‐independent word recognition performance of 96.0 percent is obtained for the 100 word data set containing similar word pairs uttered by 10 unknown speakers.