We address the problem of speech compression at very low rates, with the short-term spectrum compressed to less than 20 bits per frame. Current techniques apply structured vector quantization (VQ) to the short-term synthesis filter coefficients to achieve rates of the order of 24 to 26 bits per fram
Tree-structured vector quantization for speech recognition
โ Scribed by M. Barszcz; W. Chen; G. Boulianne; P. Kenny
- Publisher
- Elsevier Science
- Year
- 2000
- Tongue
- English
- Weight
- 90 KB
- Volume
- 14
- Category
- Article
- ISSN
- 0885-2308
No coin nor oath required. For personal study only.
โฆ Synopsis
We describe some new methods for constructing discrete acoustic phonetic hidden Markov models (HMMs) using tree quantizers having very large numbers (16-64 K) of leaf nodes and tree-structured smoothing techniques. We consider two criteria for constructing tree quantizers (minimum distortion and minimum entropy) and three types of smoothing (mixture smoothing, smoothing by adding 1 and Gaussian smoothing). We show that these methods are capable of achieving recognition accuracies which are generally comparable to those obtained with Gaussian mixture HMMs at a computational cost which is only marginally greater than that of conventional discrete HMMs. We present some evidence of superior performance in situations where the number of HMM distributions to be estimated is small compared with the amount of training data. We also show how our methods can accommodate feature vectors of much higher dimensionality than are traditionally used in speech recognition.
๐ SIMILAR VOLUMES
We have already proposed the application of tree-structured speaker clustering to supervised speaker adaptation. This paper proposes its application to unsupervised speaker adaptation and speakerindependent (SI) speech recognition. This clustering involves the selection of a speaker cluster from amo
In this paper, we propose the use of a tree-trellis search scheme for the task of large vocabulary Mandarin polysyllabic word recognition. Usually, the task of large vocabulary word recognition is computationally intractable by whole-word based approach. We convert this task into a tree network sear