Speech Dereverberation (Signals and Communication Technology)

✍ Scribed by Patrick A. Naylor (editor), Nikolay D. Gaubitch (editor)

Publisher: Springer
Year: 2010
Tongue: English
Leaves: 399
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

Speech Dereverberation gathers together an overview, a mathematical formulation of the problem and the state-of-the-art solutions for dereverberation.

Speech Dereverberation presents current approaches to the problem of reverberation. It provides a review of topics in room acoustics and also describes performance measures for dereverberation. The algorithms are then explained with mathematical analysis and examples that enable the reader to see the strengths and weaknesses of the various techniques, as well as giving an understanding of the questions still to be addressed. Techniques rooted in speech enhancement are included, in addition to a treatment of multichannel blind acoustic system identification and inversion. The TRINICON framework is shown in the context of dereverberation to be a generalization of the signal processing for a range of analysis and enhancement techniques.

Speech Dereverberation is suitable for students at masters and doctoral level, as well as established researchers.

✦ Table of Contents

Preface
Contents
List of Contributors
1 Introduction
1.1 Background
1.2 Effects of Reverberation
1.3 Speech Acquisition
1.4 System Description
1.5 Acoustic Impulse Responses
1.6 Literature Overview
1.6.1 Beamforming Using Microphone Arrays
1.6.2 Speech Enhancement Approaches to Dereverberation
1.6.3 Blind System Identification and Inversion
1.6.3.1 Blind System Identification
1.6.3.2 Inverse Filtering
1.7 Outline of the Book
References
2 Models, Measurement and Evaluation
2.1 An Overview of Room Acoustics
2.1.1 The Wave Equation
2.1.2 Sound Field in a Reverberant Room
2.1.3 Reverberation Time
2.1.4 The Critical Distance
2.1.5 Analysis of Room Acoustics Dependent on Frequency Range
2.2 Models of Room Reverberation
2.2.1 Intuitive Model
2.2.2 Finite Element Models
2.2.3 Digital Waveguide Mesh
2.2.4 Ray-tracing
2.2.5 Source-image Model
2.2.6 Statistical Room Acoustics
2.3 Subjective Evaluation
2.4 Channel-based Objective Measures
2.4.1 Normalized Projection Misalignment
2.4.2 Direct-to-reverberant Ratio
2.4.3 Early-to-total Sound Energy Ratio
2.4.4 Early-to-late Reverberation Ratio
2.5 Signal-based Objective Measures
2.5.1 Log Spectral Distortion
2.5.2 Bark Spectral Distortion
2.5.3 Reverberation Decay Tail
2.5.4 Signal-to-reverberant Ratio
2.5.4.1 Relationship Between DRR and SRR
2.5.4.2 Level Normalization in SRR
2.5.4.3 SRR Computation Example
2.5.4.4 SRR Summary
2.5.5 Experimental Comparisons
2.6 Dereverberation Performance of the Delay-and-sum Beamformer
2.6.1 Simulation Results: DSB Performance
Experiment 1: Effect of Source-microphone Distance
Experiment 2: Effect of Number of Microphones
2.7 Summary and Discussion
References
3 Speech Dereverberation Using Statistical Reverberation Models
3.1 Introduction
3.2 Review of Dereverberation Methods
3.2.1 Reverberation Cancellation
3.2.2 Reverberation Suppression
3.3 Statistical Reverberation Models
3.3.1 Polack’s Statistical Model
3.3.2 Generalized Statistical Model
3.4 Single-microphone Spectral Enhancement
3.4.1 Problem Formulation
3.4.2 MMSE Log-spectral Amplitude Estimator
3.4.3 a priori SIR Estimator
3.5 Multi-microphone Spectral Enhancement
3.5.1 Problem Formulation
3.5.2 Two Multi-microphone Systems
3.5.2.1 MVDR Beamformer and Single-channel MMSE Estimator
3.5.2.2 Non-linear Spatial Processor
3.5.3 Speech Presence Probability Estimator
3.6 Late Reverberant Spectral Variance Estimator
3.7 Estimating Model Parameters
3.7.1 Reverberation Time
3.7.2 Direct-to-reverberant Ratio
3.8 Experimental Results
3.8.1 Using One Microphone
3.8.2 Using Multiple Microphones
3.9 Summary and Outlook
Acknowledgment
References
4 Dereverberation Using LPC-based Approaches
4.1 Introduction
4.2 Linear Predictive Coding of Speech
4.3 LPC on Reverberant Speech
4.3.1 Effects of Reverberation on the LPC Coefficients
4.3.1.1 Single Microphone
4.3.1.2 JointMultichannel Optimization
4.3.1.3 LPC at the Output of a Delay-and-sum Beamformer
4.3.2 Effects of Reverberation on the Prediction Residual
4.3.3 Simulation Examples for LPC on Reverberant Speech
4.4 Dereverberation Employing LPC
4.4.1 Regional Weighting Function
4.4.2 Weighting Function Based on Hilbert Envelopes
4.4.3 Wavelet Extrema Clustering
4.4.4 Weight Function from Coarse Channel Estimates
4.4.5 Kurtosis Maximizing Adaptive Filter
4.5 Spatiotemporal Averaging Method for Enhancement of Reverberant Speech
4.5.1 Larynx Cycle Segmentation with Multichannel DYPSA
4.5.2 Time Delay of Arrival Estimation for Spatial Averaging
4.5.3 Voiced/Unvoiced/Silence Detection
4.5.4 Weighted Inter-cycle Averaging
4.5.5 Dereverberation Results
4.6 Summary
Appendix A
References
5 Multi-microphone Speech Dereverberation Using Eigen-decomposition
5.1 Introduction
5.2 Problem Formulation
5.3 Preliminaries
5.4 AIR Estimation – Algorithm Derivation
5.5 Extensions of the Basic Algorithm
5.5.1 Two-microphone Noisy Case
5.5.1.1 White Noise Case
5.5.1.2 Colored Noise Case
5.5.2 Multi-microphone Case (M > 2)
5.5.3 Partial Knowledge of the Null Subspace
5.6 AIR Estimation in Subbands
5.7 Signal Reconstruction
5.8 Experimental Study
5.8.1 Full-band Version – Results
5.8.2 Subband Version – Results
5.9 Limitations of the Proposed Algorithms and Possible Remedies
5.9.1 Noise Robustness
5.9.2 Computational Complexity and Memory Requirements
5.9.3 Common Zeros
5.9.4 The Demand for the Entire AIR Compensation
5.9.5 Filter-bank Design
5.9.6 Gain Ambiguity
5.10 Summary and Conclusions
References
6 Adaptive Blind Multichannel System Identification
6.1 Introduction
6.2 Problem Formulation
6.2.1 Channel Identifiability Conditions
6.3 Review of Adaptive Algorithms for Acoustic BSI Employing Cross-relations
6.3.1 The Multichannel Least Mean Squares Algorithm
6.3.2 The Normalized Multichannel Frequency Domain LMS Algorithm
6.3.3 The Improved Proportionate NMCFLMS Algorithm
6.4 Effect of Noise on the NMCFLMS Algorithm – The Misconvergence Problem
6.5 The Constraint Based ext-NMCFLMS Algorithm
6.5.1 Effect of Noise on the Cost Function
6.5.2 Penalty Term Using the Direct-path Constraint
6.5.3 Delay Estimation
6.5.4 Flattening Point Estimation
6.6 Simulation Results
6.6.1 Experimental Setup
6.6.2 Variation of Convergence rate on β
6.6.3 Degradation Due to Direct-path Estimation
6.6.4 Comparison of Algorithm Performance Using a WGN Input Signal
6.6.5 Comparison of Algorithm Performance Using Speech Input Signals
6.7 Conclusions
References
7 Subband Inversion of Multichannel Acoustic Systems
7.1 Introduction
7.2 Multichannel Equalization
7.3 Equalization with Inexact Impulse Responses
7.3.1 Effects of System Mismatch
7.3.2 Effects of System Length
7.4 Subband Multichannel Equalization
7.4.1 Oversampled Filter-banks
7.4.2 Subband Decomposition
7.4.3 Subband Multichannel Equalization
7.5 Computational Complexity
7.6 Application to Speech Dereverberation
7.7 Simulations and Results
7.7.1 Experiment 1: Complex Subband Decomposition
7.7.2 Experiment 2: Random Channels
7.7.3 Experiment 3: Simulated Room Impulse Responses
7.7.4 Experiment 4: Speech Dereverberation
7.8 Summary
References
8 Bayesian Single Channel Blind Dereverberation of Speech from a Moving Talker
8.1 Introduction and Overview
8.1.1 Model-based Framework
8.1.1.1 Online vs. Offline Numerical Methods
8.1.1.2 Parametric Estimation and Optimal Filtering methods
8.1.2 Practical Blind Dereverberation Scenarios
8.1.2.1 Single-sensor Applications
8.1.2.2 Time-varying Acoustic Channels
8.1.3 Chapter Organisation
8.2 Mathematical Problem Formulation
8.2.1 Bayesian Framework for Blind Dereverberation
8.2.2 Classification of Blind Dereverberation Formulations
8.2.3 Numerical Bayesian Methods
8.2.3.1 Markov Chain Monte Carlo
8.2.3.2 Sequential Monte Carlo
8.2.3.3 General Comments
8.2.4 Identifiability
8.3 Nature of Room Acoustics
8.3.1 Regions of the Audible Spectrum
8.3.2 The Room Transfer Function
8.3.3 Issues with Modelling Room Transfer Functions
Long and Non-minimum Phase AIRs
Robustness to Estimation Error and Variation of Inverse of the AIR
Subband and Frequency-zooming Solu
8.4 Parametric Channel Models
8.4.1 Pole-zero and All-zero Models
8.4.2 The Common-acoustical Pole and Zero Model
8.4.3 The All-pole Model
8.4.4 Subband All-pole Modelling
8.4.5 The Nature of Time-varying All-pole Models
8.4.6 Static Modelling of TVAP Parameters
8.4.7 Stochastic Modelling of Acoustic Channels
8.5 Noise and System Model
8.6 Source Model
8.6.1 Speech Production
8.6.2 Time-varying AR Modelling of Unvoiced Speech
8.6.2.1 Statistical Nature of Speech Parameter Variation
8.6.3 Static Block-based Modelling of TVAR Parameters
8.6.3.1 Basis Function Representation
8.6.3.2 Choice of Basis Functions
8.6.3.3 Block-based Time-varying Approach
8.6.4 Stochastic Modelling of TVAR Parameters
8.7 Bayesian Blind Dereverberation Algorithms
8.7.1 Offline Processing Using MCMC
8.7.1.1 Likelihood for Source Signal
8.7.1.2 Complete Likelihood for Observations
8.7.1.3 Prior Distributions of Source, Channel and Error Residual
8.7.1.4 Posterior Distribution of the Channel Parameters
8.7.1.5 Experimental Results
8.7.2 Online Processing Using Sequential Monte Carlo
8.7.2.1 Source and Channel Model
8.7.2.2 Conditionally Gaussian State Space
8.7.2.3 Methodology
8.7.2.4 Channel Estimation Using Bayesian Channel Updates
8.7.2.5 Experimental Results
8.7.3 Comparison of Offline and Online Approaches
8.8 Conclusions
References
9 Inverse Filtering for Speech Dereverberation Without the Use of Room Acoustics Information
9.1 Introduction
9.2 Inverse Filtering for Speech Dereverberation
9.2.1 Speech Capture Model with Multiple Microphones
9.2.2 Optimal Inverse Filtering
9.2.3 Unsupervised Algorithm to Approximate Optimal Processing
9.3 Approaches to Solving the Over-whitening of the Recovered Speech
9.3.1 Precise Compensation for Over-whitening of Target Speech
9.3.1.1 Principle
9.3.1.2 Close to Perfect Dereverberation
9.3.1.3 Dereverberation and Coherent Noise Reduction
9.3.1.4 Sensitivity to Incoherent N
9.3.2 Late Reflection Removal with Multichannel Multistep LP
9.3.2.1 Principle
9.3.2.2 Speech Dereverberation Performance in Terms of ASR Score
9.3.2.3 Speech Dereverberation in a Noisy Environment
9.3.2.4 Dereverberation of Multiple Sound Source Signals
9.3.3 Joint Estimation of Linear Predictors and Short-time Speech Characteristics
9.3.3.1 Background
9.3.3.2 Principle
9.3.3.3 Algorithms
9.3.4 Probabilistic Model Based Speech Dereverberation
9.3.4.1 Probabilistic Speech Model
9.3.4.2 Likelihood Function for Multichannel LP
9.3.4.3 Autocorrelation Codebook-based Speech Dereverberation
9.4 Concluding Remarks
Appendix A
References
10 TRINICON for Dereverberation of Speech and Audio Signals
10.1 Introduction
10.1.1 Generic Tasks for Blind Adaptive MIMO Filtering
10.1.2 A Compact Matrix Formulation for MIMO Filtering Problems
10.1.3 Overview of this Chapter
10.2 Ideal Inversion Solution and the Direct-inverse Approach to Blind Deconvolution
10.3 Ideal Solution of Direct Adaptive Filtering Problems and the Identification-and-inversion Approach to Blind Deconvolution
10.3.1 Ideal Separation Solution for Two Sources and Two Sensors
10.3.2 Relation to MIMO and SIMO System Identification
10.3.3 Ideal Separation Solution and Optimum Separation Filter Length for an Arbitrary Number of Sources and Sensors
10.3.4 General Scheme for Blind System Identification
10.3.5 Application of Blind System Identification to Blind Deconvolution
10.4 TRINICON – A General Framework for Adaptive MIMO Signal Processing and Application to Blind Adaptation Problems
10.4.1 Matrix Notation for Convolutive Mixtures
10.4.2 Optimization Criterion
10.4.3 Gradient-based Coefficient Update
10.4.3.1 Alternative Formulation of the Gradient-based Coefficient Update
10.4.4 Natural Gradient-based Coefficient Update
10.4.5 Incorporation of Stochastic Source Models
10.4.5.1 Spherically Invariant Random Processes as Signal Model
10.4.5.2 Multivariate Gaussians as Signal Model: Second-order Statistics
10.4.5.3 Nearly Gaussian Densities as Signal Model
10.5 Application of TRINICON to Blind System Identification and the Identification-and-inversion Approach to Blind Deconvolution
10.5.1 Generic Gradient-based Algorithm for Direct Adaptive Filtering Problems
10.5.1.1 Illustration for Second-order Statistics
10.5.2 Realizations for the SIMO Case
10.5.2.1 Coefficient Initialization
10.5.2.2 Efficient Implementation of the Sylvester Constraint for the Special Case of SIMO Models
10.5.3 Efficient Frequency-domain Realizations for the MIMO Case
10.6 Application of TRINICON to the Direct-inverse Approach to Blind Deconvolution
10.6.1 Multichannel Blind Deconvolution
10.6.2 Multichannel Blind Partial Deconvolution
10.6.3 Special Cases and Links to Known Algoritms
10.6.3.1 SIMO vs. MIMO Mixing Systems
10.6.3.2 Efficient Implementation Using the CorrelationMethod
10.6.3.3 Relations to Some Known HOS Approaches
10.6.3.4 Relations to Some Known SOS Approaches
10.7 Experiments
10.7.1 The SIMO Case
10.7.2 The MIMO Case
10.8 Conclusions
Appendix A: Compact Derivation of the Gradient-based Coefficient Update
Appendix B: Transformation of the Multivariate Output Signal PDF in (10.39) by Blockwise Sylvester Matrix
Appendix C: Polynomial Expansions for Nearly Gaussian Probability Densities
Appendix D: Expansion of the Sylvester Constraints in (10.83)
References
Index

📜 SIMILAR VOLUMES

Speech Enhancement (Signals and Communic

📁 Speech Enhancement (Signals and Communication Technology)

✍ Jacob Benesty (Editor), Shoji Makino (Editor), Jingdong Chen (Editor) 📂 Library 📅 2005 🏛 Springer 🌐 English

A strong reference on the problem of signal and speech enhancement, describing the newest developments in this exciting field. The general emphasis is on noise reduction, because of the large number of applications that can benefit from this technology.

Speech Enhancement (Signals and Communic

📁 Speech Enhancement (Signals and Communication Technology)

✍ Jacob Benesty (editor), Shoji Makino (editor), Jingdong Chen (editor) 📂 Library 📅 2005 🏛 Springer 🌐 English

<p><span>We live in a noisy world! In all applications (telecommunications, hands-free communications, recording, human-machine interfaces, etc.) that require at least one microphone, the signal of interest is usually contaminated by noise and reverberation. As a result, the microphone signal has to

Blind Speech Separation (Signals and Com

📁 Blind Speech Separation (Signals and Communication Technology)

✍ Shoji Makino, Te-Won Lee, Hiroshi Sawada 📂 Library 📅 2007 🌐 English

This is the world’s first edited book on independent component analysis (ICA)-based blind source separation (BSS) of convolutive mixtures of speech. This book brings together a small number of leading researchers to provide tutorial-like and in-depth treatment on major ICA-based BSS topics, with the

Speech and Audio Processing in Adverse E

📁 Speech and Audio Processing in Adverse Environments (Signals and Communication Technology)

✍ Eberhard Hänsler, Gerhard Schmidt 📂 Library 📅 2008 🏛 Springer 🌐 English

The book reflects the state of the art in important areas of speech and audio signal processing. It presents topics which are missed so far and most recent findings in the field. Leading international experts report on their field of work and their new results. Considerable amount of space is co

Advances in Speech and Music Technology:

📁 Advances in Speech and Music Technology: Computational Aspects and Applications (Signals and Communication Technology)

✍ Anupam Biswas 📂 Library 🌐 English

Voice and Speech Quality Perception: Ass

📁 Voice and Speech Quality Perception: Assessment and Evaluation (Signals and Communication Technology)

✍ Ute Jekosch 📂 Library 📅 2005 🏛 Springer 🌐 English

Foundations of Voice and Speech Quality Perception starts out with the fundamental question of: "How do listeners perceive voice and speech quality and how can these processes be modeled?" Any quantitative answers require measurements. This is natural for physical quantities but harder to imagine fo