We have already proposed the application of tree-structured speaker clustering to supervised speaker adaptation. This paper proposes its application to unsupervised speaker adaptation and speakerindependent (SI) speech recognition. This clustering involves the selection of a speaker cluster from amo
Predictor codebook for speaker-independent speech recognition
β Scribed by Takeshi Kawabata
- Book ID
- 104591609
- Publisher
- John Wiley and Sons
- Year
- 1994
- Tongue
- English
- Weight
- 752 KB
- Volume
- 25
- Category
- Article
- ISSN
- 0882-1666
No coin nor oath required. For personal study only.
β¦ Synopsis
Abstract
This paper discusses a method to handle the diversified dynamic features of speech by representing the dynamic features of speech by spectrum predictors and constructing the codebook containing predictors as the elements. The effectiveness of the method for speakerβindependent speech recognition is examined. Three kinds of predictor structures, i.e., the forward predictor, the backward predictor and the interpolator, are examined. The predictor codebook is constructed by the predictor quantization procedure, which is a small modification of the LBG algorithm. For the evaluation of the phoneme recognition level, two kinds statistical evaluation quantities and the phoneme recognition rate have been considered. It is seen as a result that the predictor codebook can realize a high phoneme separation capability and the robustness against the speaker variation. By combining the process actually into the phrase recognition system, the performance at the continuous speech recognition level is evaluated. In either case, the codebook with the backward predictor as the elements exhibited the highest performance.
π SIMILAR VOLUMES
## Abstract Matrix quantization (MQ) is a method which directly quantizes the spectrumβtime pattern. However, it has a problem in that the quantization error is relatively large compared to the vector quantization (VQ), since the dimension is large and the pattern variation is less. From such a vi
This paper proposes an instantaneous speaker adaptation method that uses N-best decoding for continuous mixture-density hidden-Markovmodel-based speech-recognition systems. This method is effective even for speakers whose decoding using speaker-independent (SI) models are error-prone and for whom sp