Topic-based language models using Dirichlet Mixtures
β Scribed by Kugatsu Sadamitsu; Takuya Mishina; Mikio Yamamoto
- Book ID
- 104591286
- Publisher
- John Wiley and Sons
- Year
- 2007
- Tongue
- English
- Weight
- 407 KB
- Volume
- 38
- Category
- Article
- ISSN
- 0882-1666
No coin nor oath required. For personal study only.
β¦ Synopsis
Abstract
We propose a generative text model using Dirichlet Mixtures as a distribution for parameters of a multinomial distribution, whose compound distribution is Polya Mixtures, and show that the model exhibits high performance in application to statistical language models. In this paper, we discuss some methods for estimating parameters of Dirichlet Mixtures and for estimating the expectation values of the a posteriori distribution needed for adaptation, and then compare them with two previous text models. The first conventional model is the Mixture of Unigrams, which is often used for incorporating topics into statistical language models. The second one is LDA (Latent Dirichlet Allocation), a typical generative text model. In an experiment using document probabilities and dynamic adaptation of nβgram models for newspaper articles, we show that the proposed model, in comparison with the two previous models, can achieve a lower perplexity at low mixture numbers. Β© 2007 Wiley Periodicals, Inc. Syst Comp Jpn, 38(12): 76β 85, 2007; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/scj.20629
π SIMILAR VOLUMES