✦ LIBER ✦

Cochannel speaker count labelling based on the use of cepstral and pitch prediction derived features

✍ Scribed by Michael A. Lewis; Ravi P. Ramachandran

Publisher: Elsevier Science
Year: 2001
Tongue: English
Weight: 136 KB
Volume: 34
Category: Article
ISSN: 0031-3203
DOI: 10.1016/s0031-3203(00)00004-2

No coin nor oath required. For personal study only.

✦ Synopsis

Cochannel interference of speech signals is a common practical problem particularly in tactical communications. Ideally, separation of the individual speech signals is desired. However, it is known that when two equal bandwidth signals are added, such a separation is not possible. We examine the problem of identifying temporal regions or frames as being either one-speaker or two-speaker speech. This identi"cation is important in making automatic speaker and speech recognition systems more robust and is based on feature extraction and subsequent classi"cation as is done in pattern recognition. The research has looked into both the closed-set problem where the identity of the tow interfering speakers are known a priori and the more di$cult open-set problem where the identities are not known (speaker independent). For the feature extraction step, we propose a new pitch prediction feature (PPF) which is compared with the linear Predictive cepstral coe$cients (LPCC) and the mel frequency cepstral coe$cients (MFCC). The features are computed and classi"ed on a frame-by-frame basis. We compare the performance of two classi"ers, namely, the neural tree network (NTN) and vector quantizer (VQ). The results show that in both the closed-and open-set cases, (1) the VQ is the better classi"er and (2) the PPF outperforms both the MFCC and LPCC features. The superiority of the PFF comes with the added bene"ts of using a scalar feature as opposed to the 12-dimensional vectorial LPCC and MFCC features and a lower VQ codebook size.