𝔖 Scriptorium
✦   LIBER   ✦

📁

Digital Speech Transmission and Enhancement, 2nd Edition

✍ Scribed by Peter Vary, Dr. Rainer Martin


Publisher
Wiley
Year
2023
Tongue
English
Leaves
595
Edition
2
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis


DIGITAL SPEECH TRANSMISSION AND ENHANCEMENT

Enables readers to understand the latest developments in speech enhancement/transmission due to advances in computational power and device miniaturization

The Second Edition of Digital Speech Transmission and Enhancement has been updated throughout to provide all the necessary details on the latest advances in the theory and practice in speech signal processing and its applications, including many new research results, standards, algorithms, and developments which have recently appeared and are on their way into state-of-the-art applications.

Besides mobile communications, which constituted the main application domain of the first edition, speech enhancement for hearing instruments and man-machine interfaces has gained significantly more prominence in the past decade, and as such receives greater focus in this updated and expanded second edition.

Readers can expect to find information and novel methods on

Low-latency spectral analysis-synthesis, single-channel and dual-channel algorithms for noise reduction and dereverberation
Multi-microphone processing methods, which are now widely used in applications such as mobile phones, hearing aids, and man-computer interfaces
Algorithms for near-end listening enhancement, which provide a significantly increased speech intelligibility for users at the noisy receiving side of their mobile phone
Fundamentals of speech signal processing, estimation and machine learning, speech coding, error concealment by soft decoding, and artificial bandwidth extension of speech signals

Digital Speech Transmission and Enhancement is a single-source, comprehensive guide to the fundamental issues, algorithms, standards, and trends in speech signal processing and speech communication technology, and as such is an invaluable resource for engineers, researchers, academics, and graduate students in the areas of communications, electrical engineering, and information technology

✦ Table of Contents


Cover
Title Page
Copyright
Contents
Preface
Chapter 1 Introduction
Chapter 2 Models of Speech Production and Hearing
2.1 Sound Waves
2.2 Organs of Speech Production
2.3 Characteristics of Speech Signals
2.4 Model of Speech Production
2.4.1 Acoustic Tube Model of the Vocal Tract
2.4.2 Discrete Time All‐Pole Model of the Vocal Tract
2.5 Anatomy of Hearing
2.6 Psychoacoustic Properties of the Auditory System
2.6.1 Hearing and Loudness
2.6.2 Spectral Resolution
2.6.3 Masking
2.6.4 Spatial Hearing
2.6.4.1 Head‐Related Impulse Responses and Transfer Functions
2.6.4.2 Law of The First Wavefront
References
Chapter 3 Spectral Transformations
3.1 Fourier Transform of Continuous Signals
3.2 Fourier Transform of Discrete Signals
3.3 Linear Shift Invariant Systems
3.3.1 Frequency Response of LSI Systems
3.4 The z‐transform
3.4.1 Relation to Fourier Transform
3.4.2 Properties of the ROC
3.4.3 Inverse z‐Transform
3.4.4 z‐Transform Analysis of LSI Systems
3.5 The Discrete Fourier Transform
3.5.1 Linear and Cyclic Convolution
3.5.2 The DFT of Windowed Sequences
3.5.3 Spectral Resolution and Zero Padding
3.5.4 The Spectrogram
3.5.5 Fast Computation of the DFT: The FFT
3.5.6 Radix‐2 Decimation‐in‐Time FFT
3.6 Fast Convolution
3.6.1 Fast Convolution of Long Sequences
3.6.2 Fast Convolution by Overlap‐Add
3.6.3 Fast Convolution by Overlap‐Save
3.7 Analysis–Modification–Synthesis Systems
3.8 Cepstral Analysis
3.8.1 Complex Cepstrum
3.8.2 Real Cepstrum
3.8.3 Applications of the Cepstrum
3.8.3.1 Construction of Minimum‐Phase Sequences
3.8.3.2 Deconvolution by Cepstral Mean Subtraction
3.8.3.3 Computation of the Spectral Distortion Measure
3.8.3.4 Fundamental Frequency Estimation
References
Chapter 4 Filter Banks for Spectral Analysis and Synthesis
4.1 Spectral Analysis Using Narrowband Filters
4.1.1 Short‐Term Spectral Analyzer
4.1.2 Prototype Filter Design for the Analysis Filter Bank
4.1.3 Short‐Term Spectral Synthesizer
4.1.4 Short‐Term Spectral Analysis and Synthesis
4.1.5 Prototype Filter Design for the Analysis–Synthesis filter bank
4.1.6 Filter Bank Interpretation of the DFT
4.2 Polyphase Network Filter Banks
4.2.1 PPN Analysis Filter Bank
4.2.2 PPN Synthesis Filter Bank
4.3 Quadrature Mirror Filter Banks
4.3.1 Analysis–Synthesis Filter Bank
4.3.2 Compensation of Aliasing and Signal Reconstruction
4.3.3 Efficient Implementation
4.4 Filter Bank Equalizer
4.4.1 The Reference Filter Bank
4.4.2 Uniform Frequency Resolution
4.4.3 Adaptive Filter Bank Equalizer: Gain Computation
4.4.3.1 Conventional Spectral Subtraction
4.4.3.2 Filter Bank Equalizer
4.4.4 Non‐uniform Frequency Resolution
4.4.5 Design Aspects & Implementation
References
Chapter 5 Stochastic Signals and Estimation
5.1 Basic Concepts
5.1.1 Random Events and Probability
5.1.2 Conditional Probabilities
5.1.3 Random Variables
5.1.4 Probability Distributions and Probability Density Functions
5.1.5 Conditional PDFs
5.2 Expectations and Moments
5.2.1 Conditional Expectations and Moments
5.2.2 Examples
5.2.2.1 The Uniform Distribution
5.2.2.2 The Gaussian Density
5.2.2.3 The Exponential Density
5.2.2.4 The Laplace Density
5.2.2.5 The Gamma Density
5.2.2.6 χ2‐Distribution
5.2.3 Transformation of a Random Variable
5.2.4 Relative Frequencies and Histograms
5.3 Bivariate Statistics
5.3.1 Marginal Densities
5.3.2 Expectations and Moments
5.3.3 Uncorrelatedness and Statistical Independence
5.3.4 Examples of Bivariate PDFs
5.3.4.1 The Bivariate Uniform Density
5.3.4.2 The Bivariate Gaussian Density
5.3.5 Functions of Two Random Variables
5.4 Probability and Information
5.4.1 Entropy
5.4.2 Kullback–Leibler Divergence
5.4.3 Cross‐Entropy
5.4.4 Mutual Information
5.5 Multivariate Statistics
5.5.1 Multivariate Gaussian Distribution
5.5.2 Gaussian Mixture Models
5.6 Stochastic Processes
5.6.1 Stationary Processes
5.6.2 Auto‐Correlation and Auto‐Covariance Functions
5.6.3 Cross‐Correlation and Cross‐Covariance Functions
5.6.4 Markov Processes
5.6.5 Multivariate Stochastic Processes
5.7 Estimation of Statistical Quantities by Time Averages
5.7.1 Ergodic Processes
5.7.2 Short‐Time Stationary Processes
5.8 Power Spectrum and its Estimation
5.8.1 White Noise
5.8.2 The Periodogram
5.8.3 Smoothed Periodograms
5.8.3.1 Non Recursive Smoothing in Time
5.8.3.2 Recursive Smoothing in Time
5.8.3.3 Log‐Mel Filter Bank Features
5.8.4 Power Spectra and Linear Shift‐Invariant Systems
5.9 Statistical Properties of Speech Signals
5.10 Statistical Properties of DFT Coefficients
5.10.1 Asymptotic Statistical Properties
5.10.2 Signal‐Plus‐Noise Model
5.10.3 Statistics of DFT Coefficients for Finite Frame Lengths
5.11 Optimal Estimation
5.11.1 MMSE Estimation
5.11.2 Estimation of Discrete Random Variables
5.11.3 Optimal Linear Estimator
5.11.4 The Gaussian Case
5.11.5 Joint Detection and Estimation
5.12 Non‐Linear Estimation with Deep Neural Networks
5.12.1 Basic Network Components
5.12.1.1 The Perceptron
5.12.1.2 Convolutional Neural Network
5.12.2 Basic DNN Structures
5.12.2.1 Fully‐Connected Feed‐Forward Network
5.12.2.2 Autoencoder Networks
5.12.2.3 Recurrent Neural Networks
5.12.2.4 Time Delay, Wavenet, and Transformer Networks
5.12.2.5 Training of Neural Networks
5.12.2.6 Stochastic Gradient Descent (SGD)
5.12.2.7 Adaptive Moment Estimation Method (ADAM)
References
Chapter 6 Linear Prediction
6.1 Vocal Tract Models and Short‐Term Prediction
6.1.1 All‐Zero Model
6.1.2 All‐Pole Model
6.1.3 Pole‐Zero Model
6.2 Optimal Prediction Coefficients for Stationary Signals
6.2.1 Optimum Prediction
6.2.2 Spectral Flatness Measure
6.3 Predictor Adaptation
6.3.1 Block‐Oriented Adaptation
6.3.1.1 Auto‐Correlation Method
6.3.1.2 Covariance Method
6.3.1.3 Levinson–Durbin Algorithm
6.3.2 Sequential Adaptation
6.4 Long‐Term Prediction
References
Chapter 7 Quantization
7.1 Analog Samples and Digital Representation
7.2 Uniform Quantization
7.3 Non‐uniform Quantization
7.4 Optimal Quantization
7.5 Adaptive Quantization
7.6 Vector Quantization
7.6.1 Principle
7.6.2 The Complexity Problem
7.6.3 Lattice Quantization
7.6.4 Design of Optimal Vector Code Books
7.6.5 Gain–Shape Vector Quantization
7.7 Quantization of the Predictor Coefficients
7.7.1 Scalar Quantization of the LPC Coefficients
7.7.2 Scalar Quantization of the Reflection Coefficients
7.7.3 Scalar Quantization of the LSF Coefficients
References
Chapter 8 Speech Coding
8.1 Speech‐Coding Categories
8.2 Model‐Based Predictive Coding
8.3 Linear Predictive Waveform Coding
8.3.1 First‐Order DPCM
8.3.2 Open‐Loop and Closed‐Loop Prediction
8.3.3 Quantization of the Residual Signal
8.3.3.1 Quantization with Open‐Loop Prediction
8.3.3.2 Quantization with Closed‐Loop Prediction
8.3.3.3 Spectral Shaping of the Quantization Error
8.3.4 ADPCM with Sequential Adaptation
8.4 Parametric Coding
8.4.1 Vocoder Structures
8.4.2 LPC Vocoder
8.5 Hybrid Coding
8.5.1 Basic Codec Concepts
8.5.1.1 Scalar Quantization of the Residual Signal
8.5.1.2 Vector Quantization of the Residual Signal
8.5.2 Residual Signal Coding: RELP
8.5.3 Analysis by Synthesis: CELP
8.5.3.1 Principle
8.5.3.2 Fixed Code Book
8.5.3.3 Long‐Term Prediction, Adaptive Code Book
8.6 Adaptive Postfiltering
8.7 Speech Codec Standards: Selected Examples
8.7.1 GSM Full‐Rate Codec
8.7.2 EFR Codec
8.7.3 Adaptive Multi‐Rate Narrowband Codec (AMR‐NB)
8.7.4 ITU‐T/G.722: 7 kHz Audio Coding within 64 kbit/s
8.7.5 Adaptive Multi‐Rate Wideband Codec (AMR‐WB)
8.7.6 Codec for Enhanced Voice Services (EVS)
8.7.7 Opus Codec IETF RFC 6716
References
Chapter 9 Concealment of Erroneous or Lost Frames
9.1 Concepts for Error Concealment
9.1.1 Error Concealment by Hard Decision Decoding
9.1.2 Error Concealment by Soft Decision Decoding
9.1.3 Parameter Estimation
9.1.3.1 MAP Estimation
9.1.3.2 MS Estimation
9.1.4 The A Posteriori Probabilities
9.1.4.1 The A Priori Knowledge
9.1.4.2 The Parameter Distortion Probabilities
9.1.5 Example: Hard Decision vs. Soft Decision
9.2 Examples of Error Concealment Standards
9.2.1 Substitution and Muting of Lost Frames
9.2.2 AMR Codec: Substitution and Muting of Lost Frames
9.2.3 EVS Codec: Concealment of Lost Packets
9.3 Further Improvements
References
Chapter 10 Bandwidth Extension of Speech Signals
10.1 BWE Concepts
10.2 BWE using the Model of Speech Production
10.2.1 Extension of the Excitation Signal
10.2.2 Spectral Envelope Estimation
10.2.2.1 Minimum Mean Square Error Estimation
10.2.2.2 Conditional Maximum A Posteriori Estimation
10.2.2.3 Extensions
10.2.2.4 Simplifications
10.2.3 Energy Envelope Estimation
10.3 Speech Codecs with Integrated BWE
10.3.1 BWE in the GSM Full‐Rate Codec
10.3.2 BWE in the AMR Wideband Codec
10.3.3 BWE in the ITU Codec G.729.1
References
Chapter 11 NELE: Near‐End Listening Enhancement
11.1 Frequency Domain NELE (FD)
11.1.1 Speech Intelligibility Index NELE Optimization
11.1.1.1 SII‐Optimized NELE Example
11.1.2 Closed‐Form Gain‐Shape NELE
11.1.2.1 The NoiseProp Shaping Function
11.1.2.2 The NoiseInverse Strategy
11.1.2.3 Gain‐Shape Frequency Domain NELE Example
11.2 Time Domain NELE (TD)
11.2.1 NELE Processing using Linear Prediction Filters
References
Chapter 12 Single‐Channel Noise Reduction
12.1 Introduction
12.2 Linear MMSE Estimators
12.2.1 Non‐causal IIR Wiener Filter
12.2.2 The FIR Wiener Filter
12.3 Speech Enhancement in the DFT Domain
12.3.1 The Wiener Filter Revisited
12.3.2 Spectral Subtraction
12.3.3 Estimation of the A Priori SNR
12.3.3.1 Decision‐Directed Approach
12.3.3.2 Smoothing in the Cepstrum Domain
12.3.4 Quality and Intelligibility Evaluation
12.3.4.1 Noise Oversubtraction
12.3.4.2 Spectral Floor
12.3.4.3 Limitation of the A Priori SNR
12.3.4.4 Adaptive Smoothing of the Spectral Gain
12.3.5 Spectral Analysis/Synthesis for Speech Enhancement
12.4 Optimal Non‐linear Estimators
12.4.1 Maximum Likelihood Estimation
12.4.2 Maximum A Posteriori Estimation
12.4.3 MMSE Estimation
12.4.3.1 MMSE Estimation of Complex Coefficients
12.4.3.2 MMSE Amplitude Estimation
12.5 Joint Optimum Detection and Estimation of Speech
12.6 Computation of Likelihood Ratios
12.7 Estimation of the A Priori and A Posteriori Probabilities of Speech Presence
12.7.1 Estimation of the A Priori Probability
12.7.2 A Posteriori Speech Presence Probability Estimation
12.7.3 SPP Estimation Using a Fixed SNR Prior
12.8 VAD and Noise Estimation Techniques
12.8.1 Voice Activity Detection
12.8.1.1 Detectors Based on the Subband SNR
12.8.2 Noise Power Estimation Based on Minimum Statistics
12.8.3 Noise Estimation Using a Soft‐Decision Detector
12.8.4 Noise Power Tracking Based on Minimum Mean Square Error Estimation
12.8.5 Evaluation of Noise Power Trackers
12.9 Noise Reduction with Deep Neural Networks
12.9.1 Processing Model
12.9.2 Estimation Targets
12.9.3 Loss Function
12.9.4 Input Features
12.9.5 Data Sets
References
Chapter 13 Dual‐Channel Noise and Reverberation Reduction
13.1 Dual‐Channel Wiener Filter
13.2 The Ideal Diffuse Sound Field and Its Coherence
13.3 Noise Cancellation
13.3.1 Implementation of the Adaptive Noise Canceller
13.4 Noise Reduction
13.4.1 Principle of Dual‐Channel Noise Reduction
13.4.2 Binaural Equalization–Cancellation and Common Gain Noise Reduction
13.4.3 Combined Single‐ and Dual‐Channel Noise Reduction
13.5 Dual‐Channel Dereverberation
13.6 Methods Based on Deep Learning
References
Chapter 14 Acoustic Echo Control
14.1 The Echo Control Problem
14.2 Echo Cancellation and Postprocessing
14.2.1 Echo Canceller with Center Clipper
14.2.2 Echo Canceller with Voice‐Controlled Soft‐Switching
14.2.3 Echo Canceller with Adaptive Postfilter
14.3 Evaluation Criteria
14.3.1 System Distance
14.3.2 Echo Return Loss Enhancement
14.4 The Wiener Solution
14.5 The LMS and NLMS Algorithms
14.5.1 Derivation and Basic Properties
14.6 Convergence Analysis and Control of the LMS Algorithm
14.6.1 Convergence in the Absence of Interference
14.6.2 Convergence in the Presence of Interference
14.6.3 Filter Order of the Echo Canceller
14.6.4 Stepsize Parameter
14.7 Geometric Projection Interpretation of the NLMS Algorithm
14.8 The Affine Projection Algorithm
14.9 Least‐Squares and Recursive Least‐Squares Algorithms
14.9.1 The Weighted Least‐Squares Algorithm
14.9.2 The RLS Algorithm
14.9.3 NLMS‐ and Kalman‐Algorithm
14.9.3.1 NLMS Algorithm
14.9.3.2 Kalman Algorithm
14.9.3.3 Summary of Kalman Algorithm
14.9.3.4 Remarks
14.10 Block Processing and Frequency Domain Adaptive Filters
14.10.1 Block LMS Algorithm
14.10.2 Frequency Domain Adaptive Filter (FDAF)
14.10.2.1 Fast Convolution and Overlap‐Save
14.10.2.2 FLMS Algorithm
14.10.2.3 Improved Stepsize Control
14.10.3 Subband Acoustic Echo Cancellation
14.10.4 Echo Canceller with Adaptive Postfilter in the Frequency Domain
14.10.5 Initialization with Perfect Sequences
14.11 Stereophonic Acoustic Echo Control
14.11.1 The Non‐uniqueness Problem
14.11.2 Solutions to the Non‐uniqueness Problem
References
Chapter 15 Microphone Arrays and Beamforming
15.1 Introduction
15.2 Spatial Sampling of Sound Fields
15.2.1 The Near‐field Model
15.2.2 The Far‐field Model
15.2.3 Sound Pickup in Reverberant Spaces
15.2.4 Spatial Correlation Properties of Acoustic Signals
15.2.5 Uniform Linear and Circular Arrays
15.2.6 Phase Ambiguity in Microphone Signals
15.3 Beamforming
15.3.1 Delay‐and‐Sum Beamforming
15.3.2 Filter‐and‐Sum Beamforming
15.4 Performance Measures and Spatial Aliasing
15.4.1 Array Gain and Array Sensitivity
15.4.2 Directivity Pattern
15.4.3 Directivity and Directivity Index
15.4.4 Example: Differential Microphones
15.5 Design of Fixed Beamformers
15.5.1 Minimum Variance Distortionless Response Beamformer
15.5.2 MVDR Beamformer with Limited Susceptibility
15.5.3 Linearly Constrained Minimum Variance Beamformer
15.5.4 Max‐SNR Beamformer
15.6 Multichannel Wiener Filter and Postfilter
15.7 Adaptive Beamformers
15.7.1 The Frost Beamformer
15.7.2 Generalized Side‐Lobe Canceller
15.7.3 Generalized Side‐lobe Canceller with Adaptive Blocking Matrix
15.7.4 Model‐Based Parsimonious‐Excitation‐Based GSC
15.8 Non‐linear Multi‐channel Noise Reduction
References
Index
EULA


📜 SIMILAR VOLUMES


Digital Speech Transmission: Enhancement
✍ Peter Vary, Rainer Martin 📂 Library 📅 2006 🏛 Wiley 🌐 English

The enormous advances in digital signal processing (DSP) technology have contributed to the wide dissemination and success of speech communication devices – be it GSM and UMTS mobile telephones, digital hearing aids, or human-machine interfaces. Digital speech transmission techniques play an importa

Power Transmission And Distribution, 2nd
✍ Anthony J. Pansini 📂 Library 📅 2004 🌐 English

This work describes the electrical, mechanical and economic considerations associated with the successful planning, design, construction, maintenance and operation of electrical transmission and distribution of power.

Telecommunication Transmission Systems,
✍ Robert G. Winch 📂 Library 📅 1998 🏛 Mcgraw-Hill (Tx) 🌐 English

The new edition of this bestselling guide contains all the information needed to master the evergrowing complexities of contemporary digital transmission equipment. Encompassing the full scope of the field, this book has the answers for engineers seeking to design and implement high-performance tele

Speech Sounds 2nd Edition (Language Work
✍ Patricia Ashby 📂 Library 📅 2005 🌐 English

Speech Sounds: * helps develop the fundamental skills of the phonetician * investigates the various aspects involved in the production of speech sounds * uses data-based material to reinforce each new concept * includes examples from a wide range of languages * provides dozens of exercises wi

Digital Speech: Coding for Low Bit Rate
✍ A. M. Kondoz 📂 Library 📅 2004 🏛 Wiley 🌐 English

Building on the success of the first edition <i>Digital Speech</i> offers extensive new, updated and revised material based upon the latest research. This <i>Second Edition</i> continues to provide the fundamental technical background required for low bit rate speech coding and the hottest developme

Digital Speech, 2nd Edition: Coding for
✍ A. M. Kondoz 📂 Library 📅 2004 🏛 Wiley 🌐 English

Building on the success of the first edition Digital Speech offers extensive new, updated and revised material based upon the latest research. This Second Edition continues to provide the fundamental technical background required for low bit rate speech coding and the hottest developments in digital