We describe SRI's speaker tracking and detection system in the NIST 1998 Speaker Detection and Tracking Development Evaluation. The system is designed for tracking switchboard conversations and uses a twospeaker and silence hidden Markov model (HMM) with a minimum state duration constraint and Gauss
Local Normalization and Delayed Decision Making in Speaker Detection and Tracking
β Scribed by Johan Koolwaaij; Lou Boves
- Publisher
- Elsevier Science
- Year
- 2000
- Tongue
- English
- Weight
- 268 KB
- Volume
- 10
- Category
- Article
- ISSN
- 1051-2004
No coin nor oath required. For personal study only.
β¦ Synopsis
This paper describes A2RT's speaker detection and tracking system and its performance on the 1999 NIST speaker recognition evaluation data. The system does not consist of concatenated modules such as, for instance, silence-speech detection, handset and gender detection, and finally speaker detection or tracking, where each module builds on the hard decisions from previous modules, but rather applies the principle of delayed decision making and postpones all hard decisions until the final stage of the detection process. This paper focuses on two important locality issues in detecting or tracking speakers in a telephone conversation, for which the speaker change frequency is usually high. First, channel estimation needs sufficiently long but homogeneous segments. Several kinds of local channel normalization are compared in this paper. Second, local estimation of speaker likelihoods critically depends on the segmentation of the conversation. Our experiments show that a global level of segmentation really improves speaker tracking performance, whereas a more detailed segmentation is needed for speaker detection, because likelihood computation over clusters of segments depends on the purity of the segments. Furthermore, choosing the appropriate type of channel normalization can give a small but consistent improvement in speaker tracking performance.
π SIMILAR VOLUMES
Two approaches to detecting and tracking speakers in multispeaker audio are described. Both approaches use an adapted Gaussian mixture model, universal background model (GMM-UBM) speaker detection system as the core speaker recognition engine. In one approach, the individual log-likelihood ratio sco
This article presents the text-independent speaker detection and tracking systems developed by the members of the ELISA Consortium for the NIST'99 speaker recognition evaluation campaign. ELISA is a consortium grouping researchers of several laboratories sharing software modules, resources and exper
## Abstract ## Objective The purpose of the current study was to examine decision making in female patients with binge eating disorder (BED) in comparison with obese and normal weight women. ## Method In the study, 20 patients with BED, 21 obese women without BED and 34 healthy women participate