## Abstract A potentially useful feature of information retrieval systems for students is the ability to identify documents that not only are relevant to the query but also match the student's reading level. Manually obtaining an estimate of reading difficulty for each document is not feasible for
Statistical language modeling with a class-basedn-multigram model
β Scribed by Sabine Deligne; Yoshinori Sagisaka
- Book ID
- 102566850
- Publisher
- Elsevier Science
- Year
- 2000
- Tongue
- English
- Weight
- 166 KB
- Volume
- 14
- Category
- Article
- ISSN
- 0885-2308
No coin nor oath required. For personal study only.
β¦ Synopsis
In this paper, we present a stochastic language-modeling tool which aims at retrieving variable-length phrases (multigrams), assuming n-gram dependencies between them, hence the name of the model: n-multigram. The estimation of the probability distribution of the phrases is intermixed with a phrase-clustering procedure in a way which jointly optimizes the likelihood of the data. As a result, the language data are iteratively structured at both a paradigmatic and a syntagmatic level in a fully integrated way. We evaluate the 2-multigram model as a statistical language model on ATIS, a task-oriented database consisting of air travel reservations. Experiments show that the 2-multigram model allows a reduction of 10% of the word error rate on ATIS with respect to the usual trigram model, with 25% fewer parameters than in the trigram model. In addition, we illustrate the ability of this model to merge semantically related phrases of different lengths into a common class.
π SIMILAR VOLUMES
An adaptive statistical language model is described, which successfully integrates long distance linguistic information with other knowledge sources. Most existing statistical language models exploit only the immediate history of a text. To extract information from further back in the document's his
Khuri and Ghosh (1990) used a certain technique to obtain minimal sufficient statistics for the unbalanced random two-fold nested model. The present article provides an extension of this technique to a general class of unbalanced models with fixed and random effects.