𝔖 Scriptorium
✦   LIBER   ✦

πŸ“

New Era for Robust Speech Recognition: Exploiting Deep Learning

✍ Scribed by Shinji Watanabe, Marc Delcroix, Florian Metze, John R. Hershey (eds.)


Publisher
Springer International Publishing
Year
2017
Tongue
English
Leaves
433
Edition
1
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis


This book covers the state-of-the-art in deep neural-network-based methods for noise robustness in distant speech recognition applications. It provides insights and detailed descriptions of some of the new concepts and key technologies in the field, including novel architectures for speech enhancement, microphone arrays, robust features, acoustic model adaptation, training data augmentation, and training criteria. The contributed chapters also include descriptions of real-world applications, benchmark tools and datasets widely used in the field.

This book is intended for researchers and practitioners working in the field of speech processing and recognition who are interested in the latest deep learning techniques for noise robustness. It will also be of interest to graduate students in electrical engineering or computer science, who will find it a useful guide to this field of research.


✦ Table of Contents


Front Matter ....Pages i-xvii
Front Matter ....Pages 1-1
Preliminaries (Shinji Watanabe, Marc Delcroix, Florian Metze, John R. Hershey)....Pages 3-17
Front Matter ....Pages 19-19
Multichannel Speech Enhancement Approaches to DNN-Based Far-Field Speech Recognition (Marc Delcroix, Takuya Yoshioka, Nobutaka Ito, Atsunori Ogawa, Keisuke Kinoshita, Masakiyo Fujimoto et al.)....Pages 21-49
Multichannel Spatial Clustering Using Model-Based Source Separation (Michael I. Mandel, Jon P. Barker)....Pages 51-77
Discriminative Beamforming with Phase-Aware Neural Networks for Speech Enhancement and Recognition (Xiong Xiao, Shinji Watanabe, Hakan Erdogan, Michael Mandel, Liang Lu, John R. Hershey et al.)....Pages 79-104
Raw Multichannel Processing Using Deep Neural Networks (Tara N. Sainath, Ron J. Weiss, Kevin W. Wilson, Arun Narayanan, Michiel Bacchiani, Bo Li et al.)....Pages 105-133
Novel Deep Architectures in Speech Processing (John R. Hershey, Jonathan Le Roux, Shinji Watanabe, Scott Wisdom, Zhuo Chen, Yusuf Isik)....Pages 135-164
Deep Recurrent Networks for Separation and Recognition of Single-Channel Speech in Nonstationary Background Audio (Hakan Erdogan, John R. Hershey, Shinji Watanabe, Jonathan Le Roux)....Pages 165-186
Robust Features in Deep-Learning-Based Speech Recognition (Vikramjit Mitra, Horacio Franco, Richard M. Stern, Julien van Hout, Luciana Ferrer, Martin Graciarena et al.)....Pages 187-217
Adaptation of Deep Neural Network Acoustic Models for Robust Automatic Speech Recognition (Khe Chai Sim, Yanmin Qian, Gautam Mantena, Lahiru Samarakoon, Souvik Kundu, Tian Tan)....Pages 219-243
Training Data Augmentation and Data Selection (Martin KarafiΓ‘t, Karel VeselΓ½, KateΕ™ina Ε½molΓ­kovΓ‘, Marc Delcroix, Shinji Watanabe, LukΓ‘Ε‘ Burget et al.)....Pages 245-260
Advanced Recurrent Neural Networks for Automatic Speech Recognition (Yu Zhang, Dong Yu, Guoguo Chen)....Pages 261-279
Sequence-Discriminative Training of Neural Networks (Guoguo Chen, Yu Zhang, Dong Yu)....Pages 281-297
End-to-End Architectures for Speech Recognition (Yajie Miao, Florian Metze)....Pages 299-323
Front Matter ....Pages 325-325
The CHiME Challenges: Robust Speech Recognition in Everyday Environments (Jon P. Barker, Ricard Marxer, Emmanuel Vincent, Shinji Watanabe)....Pages 327-344
The REVERB Challenge: A Benchmark Task for Reverberation-Robust ASR Techniques (Keisuke Kinoshita, Marc Delcroix, Sharon Gannot, EmanuΓ«l A. P. Habets, Reinhold Haeb-Umbach, Walter Kellermann et al.)....Pages 345-354
Distant Speech Recognition Experiments Using the AMI Corpus (Steve Renals, Pawel Swietojanski)....Pages 355-368
Toolkits for Robust Speech Processing (Shinji Watanabe, Takaaki Hori, Yajie Miao, Marc Delcroix, Florian Metze, John R. Hershey)....Pages 369-382
Front Matter ....Pages 383-383
Speech Research at Google to Enable Universal Speech Interfaces (Michiel Bacchiani, FranΓ§oise Beaufays, Alexander Gruenstein, Pedro Moreno, Johan Schalkwyk, Trevor Strohman et al.)....Pages 385-399
Challenges in and Solutions to Deep Learning Network Acoustic Modeling in Speech Recognition Products at Microsoft (Yifan Gong, Yan Huang, Kshitiz Kumar, Jinyu Li, Chaojun Liu, Guoli Ye et al.)....Pages 401-417
Advanced ASR Technologies for Mitsubishi Electric Speech Applications (Yuuki Tachioka, Toshiyuki Hanazawa, Tomohiro Narita, Jun Ishii)....Pages 419-429
Back Matter ....Pages 431-436

✦ Subjects


Artificial Intelligence (incl. Robotics)


πŸ“œ SIMILAR VOLUMES


Deep Learning for NLP and Speech Recogni
✍ Uday Kamath & John Liu & James Whitaker πŸ“‚ Library πŸ“… 2019 πŸ› Springer 🌐 English

With the widespread adoption of deep learning, natural language processing (NLP),and speech applications in many areas (including Finance, Healthcare, and Government) there is a growing need for one comprehensive resource that maps deep learning techniques to NLP and speech and provides insights int

Deep Learning for NLP and Speech Recogni
✍ Uday Kamath, John Liu, James Whitaker πŸ“‚ Library πŸ“… 2019 πŸ› Springer International Publishing 🌐 English

<p>This textbook explains Deep Learning Architecture, with applications to various NLP Tasks, including Document Classification, Machine Translation, Language Modeling, and Speech Recognition. With the widespread adoption of deep learning, natural language processing (NLP),and speech applications in

Deep Learning for NLP and Speech Recogni
✍ Kamath, Uday;Liu, John;Whitaker, James πŸ“‚ Library πŸ“… 2019 πŸ› Springer International Publishing 🌐 English

This textbook explains Deep Learning Architecture, with applications to various NLP Tasks, including Document Classification, Machine Translation, Language Modeling, and Speech Recognition. With the widespread adoption of deep learning, natural language processing (NLP),and speech applications in ma