Near-field Adaptive Beamformer for Robust Speech Recognition
โ Scribed by Iain A. McCowan; Darren C. Moore; S. Sridharan
- Publisher
- Elsevier Science
- Year
- 2002
- Tongue
- English
- Weight
- 237 KB
- Volume
- 12
- Category
- Article
- ISSN
- 1051-2004
No coin nor oath required. For personal study only.
โฆ Synopsis
This paper investigates a new microphone array processing technique specifically for the purpose of speech enhancement and recognition. The main objective of the proposed technique is to improve the low frequency directivity of a conventional adaptive beamformer, as low frequency performance is critical in speech processing applications. The proposed technique, termed near-field adaptive beamforming (NFAB), is implemented using the standard generalized sidelobe canceler (GSC) system structure, where a near-field superdirective (NFSD) beamformer is used as the fixed upper-path beamformer to improve the low frequency performance. In addition, to minimize signal leakage into the adaptive noise canceling path for near-field sources, a compensation unit is introduced prior to the blocking matrix. The advantage of the technique is verified by comparing the directivity patterns with those of conventional filter-sum, NFSD, and GSC systems. In speech enhancement and recognition experiments, the proposed technique outperforms the standard techniques for a near-field source in adverse noise conditions. ๏ 2002 Elsevier Science (USA)
๐ SIMILAR VOLUMES
In this paper, we present a family of maximum likelihood (ML) techniques that aim at reducing an acoustic mismatch between the training and testing conditions of hidden Markov model (HMM)-based automatic speech recognition (ASR) systems. Our study is conducted in two phases. In the first phase, we e
This paper proposes an instantaneous speaker adaptation method that uses N-best decoding for continuous mixture-density hidden-Markovmodel-based speech-recognition systems. This method is effective even for speakers whose decoding using speaker-independent (SI) models are error-prone and for whom sp
This paper compares techniques for asynchronous fusion of speech and lip information for robust speaker identification. In any fusion system, the ultimate challenge is to determine the optimal way to combine all information sources under varying conditions. We propose a new method for estimating con