𝔖 Bobbio Scriptorium
✦   LIBER   ✦

Evaluating the pronunciation component of text-to-speech systems for English: a performance comparison of different approaches

✍ Scribed by R.I. Damper; Y. Marchand; M.J. Adamson; K. Gustafson


Publisher
Elsevier Science
Year
1999
Tongue
English
Weight
233 KB
Volume
13
Category
Article
ISSN
0885-2308

No coin nor oath required. For personal study only.

✦ Synopsis


The automatic derivation of word pronunciations from input text is a central task for any text-to-speech system. For general English text at least, this is often thought to be a solved problem, with manually-derived linguistic rules assumed capable of handling "novel" words missing from the system dictionary. Data-driven methods, based on machine learning of the regularities implicit in a large pronouncing dictionary, have received considerable attention recently but are generally thought to perform less well. However, these tentative beliefs are at best uncertain without powerful methods for comparing text-to-phoneme subsystems. This paper contributes to the development of such methods by comparing the performance of four representative approaches to automatic phonemization on the same test dictionary. As well as rule-based approaches, three data-driven techniques are evaluated: pronunciation by analogy (PbA), NETspeak and IB1-IG (a modified k-nearest neighbour method). Issues involved in comparative evaluation are detailed and elucidated. The data-driven techniques outperform rules in accuracy of letter-to-phoneme translation by a very significant margin but require aligned text-phoneme training data and are slower. Best translation results are obtained with PbA at approximately 72% words correct on a resonably large pronouncing dictionary, compared with something like 26% words correct for the rules, indicating that automatic pronunciation of text is not a solved problem.