In hands-free speech recognition in which the user speaks at a distance from the microphone, the accuracy of recognition is degraded if the environment is reverberant. The reason for the degradation is that the uttered speech is affected by the surrounding noise and reverberation, which produces a m
Speaker recognition using HMM composition in noisy environments
โ Scribed by Tomoko Matsui; Tomohito Kanno; Sadaoki Furui
- Publisher
- Elsevier Science
- Year
- 1996
- Tongue
- English
- Weight
- 119 KB
- Volume
- 10
- Category
- Article
- ISSN
- 0885-2308
No coin nor oath required. For personal study only.
โฆ Synopsis
This paper investigates a speaker recognition method that is robust against background noise. In noisy environments, one important issue is how to create a model for each speaker so as to compensate for noise. The method described here is based on hidden Markov model (HMM) composition, which combines a speaker HMM and a noisesource HMM into a noise-added speaker HMM with a particular signal-to-noise ratio (SNR). Since it is difficult to measure the SNR of input speech with non-stationary noise exactly, this method creates several noise-added speaker HMMs with various SNRs. The HMM that has the highest likelihood value for the input speech is selected, and a speaker decision is made using this likelihood value.
Experimental application of this method to text-independent speaker identification and verification in various kinds of noisy environments demonstrated considerable improvement in speaker recognition for speech utterances of male speakers.
๐ SIMILAR VOLUMES
This paper proposes an unsupervised noisy environment adaptation algorithm based on the HMM acoustic model, using MLLR and a multispeaker database. An arbitrary single sentence uttered by the target speaker, together with living room noise, is used as the input, and the data with superposed noise fo
Modelling the state duration of hidden Markov models (HMMs) can effectively improve the accuracy in decoding the state sequence of an utterance and result in an improvement of speech recognition accuracy. However, when a speech signal is contaminated by ambient noise, the decoded state sequence may