𝔖 Scriptorium
✦   LIBER   ✦

πŸ“

Machine Learning for Speaker Recognition

✍ Scribed by Man-Wai Mak, Jen-Tzung Chien


Publisher
Cambridge University Press
Year
2020
Tongue
English
Leaves
329
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis


This book will help readers understand fundamental and advanced statistical models and deep learning models for robust speaker recognition and domain adaptation. This useful toolkit enables readers to apply machine learning techniques to address practical issues, such as robustness under adverse acoustic environments and domain mismatch, when deploying speaker recognition systems. Presenting state-of-the-art machine learning techniques for speaker recognition and featuring a range of probabilistic models, learning algorithms, case studies, and new trends and directions for speaker recognition based on modern machine learning and deep learning, this is the perfect resource for graduates, researchers, practitioners and engineers in electrical engineering, computer science and applied mathematics.

✦ Table of Contents


Contents
Preface
List of Abbreviations
Notations
Part I Fundamental Theories
1 Introduction
1.1 Fundamentals of Speaker Recognition
1.2 Feature Extraction
1.3 Speaker Modeling and Scoring
1.3.1 Speaker Modeling
1.3.2 Speaker Scoring
1.4 Modern Speaker Recognition Approaches
1.5 Performance Measures
1.5.1 FAR, FRR, and DET
1.5.2 Decision Cost Function
2 Learning Algorithms
2.1 Fundamentals of Statistical Learning
2.1.1 Probabilistic Models
2.1.2 Neural Networks
2.2 Expectation-Maximization Algorithm
2.2.1 Maximum Likelihood
2.2.2 Iterative Procedure
2.2.3 Alternative Perspective
2.2.4 Maximum A Posteriori
2.3 Approximate Inference
2.3.1 Variational Distribution
2.3.2 Factorized Distribution
2.3.3 EM versus VB-EM Algorithms
2.4 Sampling Methods
2.4.1 Markov Chain Monte Carlo
2.4.2 Gibbs Sampling
2.5 Bayesian Learning
2.5.1 Model Regularization
2.5.2 Bayesian Speaker Recognition
3 Machine Learning Models
3.1 Gaussian Mixture Models
3.1.1 The EM Algorithm
3.1.2 Universal Background Models
3.1.3 MAP Adaptation
3.1.4 GMM–UBM Scoring
3.2 Gaussian Mixture Model–Support Vector Machines
3.2.1 Support Vector Machines
3.2.2 GMM Supervectors
3.2.3 GMM–SVM Scoring
3.2.4 Nuisance Attribute Projection
3.3 Factor Analysis
3.3.1 Generative Model
3.3.2 EM Formulation
3.3.3 Relationship with Principal Component Analysis
3.3.4 Relationship with Nuisance Attribute Projection
3.4 Probabilistic Linear Discriminant Analysis
3.4.1 Generative Model
3.4.2 EM Formulations
3.4.3 PLDA Scoring
3.4.4 Enhancement of PLDA
3.4.5 Alternative to PLDA
3.5 Heavy-Tailed PLDA
3.5.1 Generative Model
3.5.2 Posteriors of Latent Variables
3.5.3 Model Parameter Estimation
3.5.4 Scoring in Heavy-Tailed PLDA
3.5.5 Heavy-Tailed PLDA versus Gaussian PLDA
3.6 I-Vectors
3.6.1 Generative Model
3.6.2 Posterior Distributions of Total Factors
3.6.3 I-Vector Extractor
3.6.4 Relation with MAP Adaptation in GMM–UBM
3.6.5 I-Vector Preprocessing for Gaussian PLDA
3.6.6 Session Variability Suppression
3.6.7 PLDA versus Cosine-Distance Scoring
3.6.8 Effect of Utterance Length
3.6.9 Gaussian PLDA with Uncertainty Propagation
3.6.10 Senone I-Vectors
3.7 Joint Factor Analysis
3.7.1 Generative Model of JFA
3.7.2 Posterior Distributions of Latent Factors
3.7.3 Model Parameter Estimation
3.7.4 JFA Scoring
3.7.5 From JFA to I-Vectors
Part II Advanced Studies
4 Deep Learning Models
4.1 Restricted Boltzmann Machine
4.1.1 Distribution Functions
4.1.2 Learning Algorithm
4.2 Deep Neural Networks
4.2.1 Structural Data Representation
4.2.2 Multilayer Perceptron
4.2.3 Error Backpropagation Algorithm
4.2.4 Interpretation and Implementation
4.3 Deep Belief Networks
4.3.1 Training Procedure
4.3.2 Greedy Training
4.3.3 Deep Boltzmann Machine
4.4 Stacked Autoencoder
4.4.1 Denoising Autoencoder
4.4.2 Greedy Layer-Wise Learning
4.5 Variational Autoencoder
4.5.1 Model Construction
4.5.2 Model Optimization
4.5.3 Autoencoding Variational Bayes
4.6 Generative Adversarial Networks
4.6.1 Generative Models
4.6.2 Adversarial Learning
4.6.3 Optimization Procedure
4.6.4 Gradient Vanishing and Mode Collapse
4.6.5 Adversarial Autoencoder
4.7 Deep Transfer Learning
4.7.1 Transfer Learning
4.7.2 Domain Adaptation
4.7.3 Maximum Mean Discrepancy
4.7.4 Neural Transfer Learning
5 Robust Speaker Verification
5.1 DNN for Speaker Verification
5.1.1 Bottleneck Features
5.1.2 DNN for I-Vector Extraction
5.2 Speaker Embedding
5.2.1 X-Vectors
5.2.2 Meta-Embedding
5.3 Robust PLDA
5.3.1 SNR-Invariant PLDA
5.3.2 Duration-Invariant PLDA
5.3.3 SNR- and Duration-Invariant PLDA
5.4 Mixture of PLDA
5.4.1 SNR-Independent Mixture of PLDA
5.4.2 SNR-Dependent Mixture of PLDA
5.4.3 DNN-Driven Mixture of PLDA
5.5 Multi-Task DNN for Score Calibration
5.5.1 Quality Measure Functions
5.5.2 DNN-Based Score Calibration
5.6 SNR-Invariant Multi-Task DNN
5.6.1 Hierarchical Regression DNN
5.6.2 Multi-Task DNN
6 Domain Adaptation
6.1 Overview of Domain Adaptation
6.2 Feature-Domain Adaptation/Compensation
6.2.1 Inter-Dataset Variability Compensation
6.2.2 Dataset-Invariant Covariance Normalization
6.2.3 Within-Class Covariance Correction
6.2.4 Source-Normalized LDA
6.2.5 Nonstandard Total-Factor Prior
6.2.6 Aligning Second-Order Statistics
6.2.7 Adaptation of I-Vector Extractor
6.2.8 Appending Auxiliary Features to I-Vectors
6.2.9 Nonlinear Transformation of I-Vectors
6.2.10 Domain-Dependent I-Vector Whitening
6.3 Adaptation of PLDA Models
6.4 Maximum Mean Discrepancy Based DNN
6.4.1 Maximum Mean Discrepancy
6.4.2 Domain-Invariant Autoencoder
6.4.3 Nuisance-Attribute Autoencoder
6.5 Variational Autoencoders (VAE)
6.5.1 VAE Scoring
6.5.2 Semi-Supervised VAE for Domain Adaptation
6.5.3 Variational Representation of Utterances
6.6 Generative Adversarial Networks for Domain Adaptation
7 Dimension Reduction and Data Augmentation
7.1 Variational Manifold PLDA
7.1.1 Stochastic Neighbor Embedding
7.1.2 Variational Manifold Learning
7.2 Adversarial Manifold PLDA
7.2.1 Auxiliary Classifier GAN
7.2.2 Adversarial Manifold Learning
7.3 Adversarial Augmentation PLDA
7.3.1 Cosine Generative Adversarial Network
7.3.2 PLDA Generative Adversarial Network
7.4 Concluding Remarks
8 Future Direction
8.1 Time-Domain Feature Learning
8.2 Speaker Embedding from End-to-End Systems
8.3 VAE–GAN for Domain Adaptation
8.3.1 Variational Domain Adversarial Neural Network (VDANN)
8.3.2 Relationship with Domain Adversarial Neural Network (DANN)
8.3.3 Gaussianality Analysis
Appendix: Exercises
References
Index


πŸ“œ SIMILAR VOLUMES


Machine Learning Systems for Multimodal
✍ Markus KΓ€chele πŸ“‚ Library πŸ“… 2020 πŸ› Springer Fachmedien Wiesbaden;Springer Vieweg 🌐 English

<p><p>Markus KΓ€chele offers a detailed view on the different steps in the affective computing pipeline, ranging from corpus design and recording over annotation and feature extraction to post-processing, classification of individual modalities and fusion in the context of ensemble classifiers. He fo

Self-Learning Speaker Identification: A
✍ Tobias Herbig, Franz Gerl, Wolfgang Minker (auth.) πŸ“‚ Library πŸ“… 2011 πŸ› Springer-Verlag Berlin Heidelberg 🌐 English

<p><br>Current speech recognition systems suffer from variation of voice <br>characteristics between speakers as they are usually based on speaker <br>independent speech models. In order to resolve this issue, adaptation <br>methods have been developed in many state-of-the-art systems. However, <br>

Self-Learning Speaker Identification: A
✍ Tobias Herbig, Franz Gerl, Wolfgang Minker (auth.) πŸ“‚ Library πŸ“… 2011 πŸ› Springer-Verlag Berlin Heidelberg 🌐 English

<p><br>Current speech recognition systems suffer from variation of voice <br>characteristics between speakers as they are usually based on speaker <br>independent speech models. In order to resolve this issue, adaptation <br>methods have been developed in many state-of-the-art systems. However, <br>

Self-Learning Speaker Identification A S
✍ Herbig, Tobias;Gerl, Franz;Minker, Wolfgang πŸ“‚ Library πŸ“… 2011 πŸ› Springer Berlin Heidelberg 🌐 English

Current speech recognition systems are based on speaker independent speech models and suffer from inter-speaker variations in speech signal characteristics. This work develops an integrated approach for speech and speaker recognition in order to gain space for self-learning opportunities of the syst

Machine Learning and Deep Learning Techn
✍ Ben Othman Soufiene, Chinmay Chakraborty πŸ“‚ Library πŸ“… 2024 πŸ› CRC Press 🌐 English

Machine Learning and Deep Learning Techniques for Medical Image Recognition comprehensively reviews deep learning-based algorithms in medical image analysis problems including medical image processing. It includes a detailed review of deep learning approaches for semantic object detection and segmen

Pattern Recognition & Machine Learning
✍ Y. Anzai (Auth.) πŸ“‚ Library πŸ“… 1992 πŸ› Elsevier Science 🌐 English

This is the first text to provide a unified and self-contained introduction to visual pattern recognition and machine learning. It is useful as a general introduction to artifical intelligence and knowledge engineering, and no previous knowledge of pattern recognition or machine learning is necessar