𝔖 Bobbio Scriptorium
✦   LIBER   ✦

An empirical study of smoothing techniques for language modeling

✍ Scribed by Stanley F. Chen; Joshua Goodman


Publisher
Elsevier Science
Year
1999
Tongue
English
Weight
641 KB
Volume
13
Category
Article
ISSN
0885-2308

No coin nor oath required. For personal study only.

✦ Synopsis


We survey the most widely-used algorithms for smoothing models for language n-gram modeling. We then present an extensive empirical comparison of several of these smoothing techniques, including those described by Jelinek and Mercer (1980); Katz (1987);Bell, Cleary and Witten (1990); Ney, Essen andKneser (1994), andKneser andNey (1995). We investigate how factors such as training data size, training corpus (e.g. Brown vs. Wall Street Journal), count cutoffs, and n-gram order (bigram vs. trigram) affect the relative performance of these methods, which is measured through the cross-entropy of test data. We find that these factors can significantly affect the relative performance of models, with the most significant factor being training data size. Since no previous comparisons have examined these factors systematically, this is the first thorough characterization of the relative performance of various algorithms. In addition, we introduce methodologies for analyzing smoothing algorithm efficacy in detail, and using these techniques we motivate a novel variation of Kneser-Ney smoothing that consistently outperforms all other algorithms evaluated. Finally, results showing that improved language model smoothing leads to improved speech recognition performance are presented.


πŸ“œ SIMILAR VOLUMES


Validation of an empirical model for pho
✍ Alados, I.; Alados-Arboledas, L. πŸ“‚ Article πŸ“… 1999 πŸ› John Wiley and Sons 🌐 English βš– 248 KB πŸ‘ 1 views

Knowledge of the photosynthetically active radiation is necessary in different applications dealing with plant physiology, biomass production and natural illumination in greenhouses. Nevertheless, as a result of the absence of widespread measurements of this radiometric flux, it is often calculated

The role of prior experience and task ch
✍ Ritu Agarwal; Atish P. Sinha; Mohan Tanniru πŸ“‚ Article πŸ“… 1996 πŸ› Elsevier Science 🌐 English βš– 335 KB

The object-oriented methodology for systems analysis and design has generated considerable interest recently . Object-orientation represents a fundamental shift in focus from the traditional process-oriented approaches that have dominated software development for over two decades . Although there is

An empirical study of children's source
✍ Yin Zhang πŸ“‚ Article πŸ“… 2005 πŸ› Wiley (John Wiley & Sons) 🌐 English βš– 239 KB

## Abstract This poster reports on an empirical study on children's source use for their Internet searches. A group of third‐ and fifth‐grade students participated in this study over a 15‐week period, during which the students conducted Internet searches for their schoolwork as part of their curric