This book evaluates the impact of relevant factors affecting the results of speech quality assessment studies carried out in crowdsourcing. The author describes how these factors relate to the test structure, the effect of environmental background noise, and the influence of language differences. He
Influencing Factors in Speech Quality Assessment using Crowdsourcing
β Scribed by Rafael Zequeira JimΓ©nez
- Publisher
- Springer
- Year
- 2022
- Tongue
- English
- Leaves
- 129
- Category
- Library
No coin nor oath required. For personal study only.
β¦ Synopsis
This book evaluates the impact of relevant factors affecting the results of speech quality assessment studies carried out in crowdsourcing. The author describes how these factors relate to the test structure, the effect of environmental background noise, and the influence of language differences. He details multiple user-centered studies that have been conducted to derive guidelines for reliable collection of speech quality scores in crowdsourcing. Specifically, different questions are addressed such as the optimal number of speech samples to include in a listening task, the influence of the environmental background noise in the speech quality ratings, as well as methods for classifying background noise from web audio recordings, or the impact of language proficiency in the user perception of speech quality. Ultimately, the results of these studies contributed to the definition of the ITU-T Recommendation P.808 that defines the guidelines to conduct speech quality studies in crowdsourcing.
β¦ Table of Contents
Abstract
Zusammenfassung
Acknowledgments
Contents
About the Author
Acronyms
1 Introduction
1.1 Speech Quality
1.1.1 Speech Quality Assessment
1.1.2 Crowdsourcing
1.1.3 Speech Quality Assessment in Crowdsourcing
1.1.4 Differences Between Laboratory-Based and Crowdsourcing-Based Speech Quality Assessments
1.2 Influencing Factors in Speech Quality Assessment Using Crowdsourcing
1.3 Research Questions and Thesis Outline
2 Related Work
2.1 Number of Stimuli
2.2 Worker Performance and Task Repetition
2.3 Environmental Background Noise
2.4 Influence of Language Differences
2.5 Conclusion
3 Method
3.1 Laboratory Test
3.2 Speech Database
3.2.1 SwissQual 501
3.2.2 SwissQual 502
3.2.3 SwissQual P.501 Annex D
3.3 Crowdsourcing Test
3.3.1 Standardized Evaluation Method for Speech Quality in Crowdsourcing
3.3.2 Crowdsourcing Platforms
3.3.3 Test Setup and Procedure
3.3.3.1 Qualification
3.3.3.2 Training
3.3.3.3 Assessment
3.3.4 Environment
3.3.4.1 Study Setup
3.3.4.2 Audio Recording Setup
3.3.4.3 Environment Video Recording
3.3.4.4 Environment Questionnaire
3.3.4.5 Audio Recording Analysis
3.3.4.6 Questionnaire Results
3.3.4.7 Discussion
3.4 Simulated Crowdsourcing Test in Laboratory
3.5 Result Metrics
3.6 Conclusion
4 Test Structure
4.1 Influence of Number of Stimuli
4.1.1 Study Setup
4.1.1.1 Speech Database
4.1.1.2 Method
4.1.2 Results
4.1.2.1 Qualification
4.1.2.2 Training and Assessment
4.1.2.3 Influence of Number of Stimuli
4.1.3 Discussion
4.2 Impact of Task Repetition
4.2.1 Study Setup
4.2.1.1 Speech Material
4.2.1.2 Laboratory Study
4.2.1.3 Crowdsourcing Study
4.2.2 Results
4.2.2.1 Inter-Rater Reliability
4.2.2.2 Intra-Rater Reliability
4.2.2.3 Predicting Workers' Performance
4.2.3 Discussion
4.3 Conclusion
5 Impact of Background Noise
5.1 Effect of Environmental Background Noise
5.1.1 Study Setup
5.1.1.1 Background Noise Signals
5.1.1.2 Speech Database
5.1.2 Results
5.1.2.1 Laboratory vs. CSLvl0
5.1.2.2 Influence of Background Noise
5.1.3 Discussion
5.2 Analysis of Noisy Speech Quality Scores Collected in Crowdsourcing Environments
5.2.1 Speech Quality Scores
5.2.2 Model
5.2.2.1 Feature Selection
5.2.2.2 Model Evaluation
5.2.2.3 Model Tuning
5.2.3 Discussion
5.3 Environment Background Noise Classification
5.3.1 Environment Background Noise Collection
5.3.1.1 Dataset for Noise Classification
5.3.1.2 Dataset for Noise Level Estimation
5.3.2 Experiment BN1
5.3.2.1 Results
5.3.3 Experiment BN2
5.3.3.1 Results
5.3.4 Discussion
5.4 Conclusion
6 Influence of Language
6.1 Study Setup
6.1.1 Speech Database
6.1.2 Method
6.2 Results
6.2.1 Analysis of Laboratory vs. Studies E1, E2, and E3
6.2.2 Influence of Language Differences
6.2.3 Analysis of Conditions per Group
6.3 Conclusion
7 Conclusion
Appendix A
A.1 Speech Database SwissQual 501
Appendix B
B.1 Speech Database SwissQual 502
Appendix C
C.1 Speech Database SwissQual P.501 Annex D
References
Index
π SIMILAR VOLUMES
<p><p>This book reviews research towards perceptual quality dimensions of synthetic speech, compares these findings with the state of the art, and derives a set of five universal perceptual quality dimensions for TTS signals. They are: (i) naturalness of voice, (ii) prosodic quality, (iii) fluency a
<p><P>Many resources are invested in the development and introduction of Quality Assurance Systems in educational institutions all over the world. Our assumption is that, as a result of quality assurance activities, practitioners obtain information about their own functioning and institutional perfo
<p><P>Many resources are invested in the development and introduction of Quality Assurance Systems in educational institutions all over the world. Our assumption is that, as a result of quality assurance activities, practitioners obtain information about their own functioning and institutional perfo
<p>The quality of a telecommunication voice service is largely inftuenced by the quality of the transmission system. Nevertheless, the analysis, synthesis and prediction of quality should take into account its multidimensional aspects. Quality can be regarded as a point where the perceived character
<p><b><i>Provides an insightful and practical introduction to crowdsourcing as a means of rapidly processing speech data</i></b></p><p>Intended for those who want to get started in the domain andΒ learn how to set up a task, what interfaces are available, how to assess the work, etc. as well as for