𝔖 Bobbio Scriptorium
✦   LIBER   ✦

Topic-based language models using Dirichlet Mixtures

✍ Scribed by Kugatsu Sadamitsu; Takuya Mishina; Mikio Yamamoto


Book ID
104591286
Publisher
John Wiley and Sons
Year
2007
Tongue
English
Weight
407 KB
Volume
38
Category
Article
ISSN
0882-1666

No coin nor oath required. For personal study only.

✦ Synopsis


Abstract

We propose a generative text model using Dirichlet Mixtures as a distribution for parameters of a multinomial distribution, whose compound distribution is Polya Mixtures, and show that the model exhibits high performance in application to statistical language models. In this paper, we discuss some methods for estimating parameters of Dirichlet Mixtures and for estimating the expectation values of the a posteriori distribution needed for adaptation, and then compare them with two previous text models. The first conventional model is the Mixture of Unigrams, which is often used for incorporating topics into statistical language models. The second one is LDA (Latent Dirichlet Allocation), a typical generative text model. In an experiment using document probabilities and dynamic adaptation of n‐gram models for newspaper articles, we show that the proposed model, in comparison with the two previous models, can achieve a lower perplexity at low mixture numbers. Β© 2007 Wiley Periodicals, Inc. Syst Comp Jpn, 38(12): 76– 85, 2007; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/scj.20629


πŸ“œ SIMILAR VOLUMES


Topic-Dependent-Class-Based -Gram Langua
✍ Naptali, W.; Tsuchiya, M.; Nakagawa, S. πŸ“‚ Article πŸ“… 2012 πŸ› Institute of Electrical and Electronics Engineers 🌐 English βš– 865 KB