𝔖 Scriptorium
✦   LIBER   ✦

πŸ“

Deep Learning Based Speech Quality Prediction

✍ Scribed by Gabriel Mittag


Publisher
Springer
Year
2022
Tongue
English
Leaves
171
Series
T-Labs Series in Telecommunication Services
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis


This book presents how to apply recent machine learning (deep learning) methods for the task of speech quality prediction. The author shows how recent advancements in machine learning can be leveraged for the task of speech quality prediction and provides an in-depth analysis of the suitability of different deep learning architectures for this task. The author then shows how the resulting model outperforms traditional speech quality models and provides additional information about the cause of a quality impairment through the prediction of the speech quality dimensions of noisiness, coloration, discontinuity, and loudness.

✦ Table of Contents


Preface
Acknowledgments
Contents
Acronyms
1 Introduction
1.1 Motivation
1.2 Thesis Objectives and Research Questions
1.3 Outline
2 Quality Assessment of Transmitted Speech
2.1 Speech Communication Networks
2.2 Speech Quality and Speech Quality Dimensions
2.3 Subjective Assessment
2.4 Subjective Assessment via Crowdsourcing
2.5 Traditional Instrumental Methods
2.5.1 Parametric Models
2.5.2 Double-Ended Signal-Based Models
2.5.3 Single-Ended Signal-Based Models
2.6 Machine Learning Based Instrumental Methods
2.6.1 Non-Deep Learning Machine Learning Approaches
2.6.2 Deep Learning Architectures
2.6.3 Deep Learning Based Speech Quality Models
2.7 Summary
3 Neural Network Architectures for Speech Quality Prediction
3.1 Dataset
3.1.1 Source Files
3.1.2 Simulated Distortions
3.1.3 Live Distortions
3.1.4 Listening Experiment
3.2 Overview of Neural Network Model
3.3 Mel-Spec Segmentation
3.4 Framewise Model
3.4.1 CNN
3.4.2 Feedforward Network
3.5 Time-Dependency Modelling
3.5.1 LSTM
3.5.2 Transformer/Self-Attention
3.6 Time Pooling
3.6.1 Average-/Max-Pooling
3.6.2 Last-Step-Pooling
3.6.3 Attention-Pooling
3.7 Experiments and Results
3.7.1 Training and Evaluation Metric
3.7.2 Framewise Model
3.7.3 Time-Dependency Model
3.7.4 Pooling Model
3.8 Summary
4 Double-Ended Speech Quality Prediction Using Siamese Networks
4.1 Introduction
4.2 Method
4.2.1 Siamese Neural Network
4.2.2 Reference Alignment
4.2.3 Feature Fusion
4.3 Results
4.3.1 LSTM vs Self-Attention
4.3.2 Alignment
4.3.3 Feature Fusion
4.3.4 Double-Ended vs Single-Ended
4.4 Summary
5 Prediction of Speech Quality Dimensions with Multi-TaskLearning
5.1 Introduction
5.2 Multi-Task Models
5.2.1 Fully Connected (MTL-FC)
5.2.2 Fully Connected + Pooling (MTL-POOL)
5.2.3 Fully Connected + Pooling + Time-Dependency(MTL-TD)
5.2.4 Fully Connected + Pooling + Time-Dependency + CNN (MTL-CNN)
5.3 Results
5.3.1 Per-Task Evaluation
5.3.2 All-Tasks Evaluation
5.3.3 Comparing Dimension
5.3.4 Degradation Decomposition
5.4 Summary
6 Bias-Aware Loss for Training from Multiple Datasets
6.1 Method
6.1.1 Learning with Bias-Aware Loss
6.1.2 Anchoring Predictions
6.2 Experiments and Results
6.2.1 Synthetic Data
6.2.2 Minimum Accuracy rth
6.2.3 Training Examples with and Without Anchoring
6.2.4 Configuration Comparisons
6.2.5 Speech Quality Dataset
6.3 Summary
7 NISQA: A Single-Ended Speech Quality Model
7.1 Datasets
7.1.1 POLQA Pool
7.1.2 ITU-T P Suppl. 23
7.1.3 Other Datasets
7.1.4 Live-Talking Test Set
7.2 Model and Training
7.2.1 Model
7.2.2 Bias-Aware Loss
7.2.3 Handling Missing Dimension Ratings
7.2.4 Training
7.3 Results
7.3.1 Evaluation Metrics
7.3.2 Validation Set Results: Overall Quality
7.3.3 Validation Set Results: Quality Dimensions
7.3.4 Test Set Results
7.3.5 Impairment Level vs Quality Prediction
7.4 Summary
8 Conclusions
A Dataset Condition Tables
B Train and Validation Dataset Dimension Histograms
References
Index


πŸ“œ SIMILAR VOLUMES


Bankruptcy Prediction through Soft Compu
✍ Arindam Chaudhuri,Soumya K Ghosh (auth.) πŸ“‚ Library πŸ“… 2017 πŸ› Springer Singapore 🌐 English

<p><p>This book proposes complex hierarchical deep architectures (HDA) for predicting bankruptcy, a topical issue for business and corporate institutions that in the past has been tackled using statistical, market-based and machine-intelligence prediction models. The HDA are formed through fuzzy rou

Deep Learning-Based Pose Estimation for
✍ Sushant Gautam πŸ“‚ Library πŸ“… 2023 πŸ› Eliva Press 🌐 English

<span>Dystonia is a movement disorder that causes unusual movements and involuntary muscle contractions affecting some parts of the whole body. Selecting drugs and doses is a highly personalized process for dystonia, requiring frequent visits to the clinic, pointing toward the need for more systemat

Applied Deep Learning: A Case-Based Appr
✍ Umberto Michelucci πŸ“‚ Library πŸ“… 2018 πŸ› Apress 🌐 English

Work with advanced topics in deep learning, such as optimization algorithms, hyper-parameter tuning, dropout, and error analysis as well as strategies to address typical problems encountered when training deep neural networks. You'll begin by studying the activation functions mostly with a single ne

Applied Deep Learning: A Case-Based Appr
✍ Umberto Michelucci πŸ“‚ Library πŸ“… 2018 πŸ› Apress 🌐 English

<div><p>Work with advanced topics in deep learning, such as optimization algorithms, hyper-parameter tuning, dropout, and error analysis as well as strategies to address typical problems encountered when training deep neural networks. You'll begin by studying the activation functions mostly with a s