Prediction of abstract prosodic labels for speech synthesis
β Scribed by K. Ross; M. Ostendorf
- Book ID
- 102565933
- Publisher
- Elsevier Science
- Year
- 1996
- Tongue
- English
- Weight
- 222 KB
- Volume
- 10
- Category
- Article
- ISSN
- 0885-2308
No coin nor oath required. For personal study only.
β¦ Synopsis
Higher quality speech synthesis is required to make text-to-speech technologies useful in more applications, and prosody is one component of synthesis technology with the greatest need for improvement. This paper describes computational models for the prediction of abstract prosodic labels for synthesis-accent location, symbolic tones and relative prominence level-from text that is tagged with part-of-speech labels and marked for prosodic constituent structure. Specifically, the model uses multiple levels of a prosodic hierarchy and at each level combines decision tree probability functions with Markov sequence assumptions. An advantage of decision trees is the ability to incorporate linguistic knowledge in an automatic training framework, which is needed for building systems that reflect particular speaking styles. Studies of accent and tone variability across speakers are reported and used to motivate new evaluation metrics. Prediction experiments show an improvement in accuracy of prominence location prediction over simple decision trees, with accuracy similar to the level of variability observed across speakers.
π SIMILAR VOLUMES
This paper presents a geostatistical model as a new approach to the linear prediction analysis of speech. The autocorrelation method of autoregressive modeling, which is widely applied in the linear predictive coding of speech, is used as a benchmark for comparison with the present algorithm. Before