𝔖 Bobbio Scriptorium
✦   LIBER   ✦

Statistical language modeling with a class-basedn-multigram model

✍ Scribed by Sabine Deligne; Yoshinori Sagisaka


Book ID
102566850
Publisher
Elsevier Science
Year
2000
Tongue
English
Weight
166 KB
Volume
14
Category
Article
ISSN
0885-2308

No coin nor oath required. For personal study only.

✦ Synopsis


In this paper, we present a stochastic language-modeling tool which aims at retrieving variable-length phrases (multigrams), assuming n-gram dependencies between them, hence the name of the model: n-multigram. The estimation of the probability distribution of the phrases is intermixed with a phrase-clustering procedure in a way which jointly optimizes the likelihood of the data. As a result, the language data are iteratively structured at both a paradigmatic and a syntagmatic level in a fully integrated way. We evaluate the 2-multigram model as a statistical language model on ATIS, a task-oriented database consisting of air travel reservations. Experiments show that the 2-multigram model allows a reduction of 10% of the word error rate on ATIS with respect to the usual trigram model, with 25% fewer parameters than in the trigram model. In addition, we illustrate the ability of this model to merge semantically related phrases of different lengths into a common class.


πŸ“œ SIMILAR VOLUMES


Predicting reading difficulty with stati
✍ Kevyn Collins-Thompson; Jamie Callan πŸ“‚ Article πŸ“… 2005 πŸ› John Wiley and Sons 🌐 English βš– 373 KB

## Abstract A potentially useful feature of information retrieval systems for students is the ability to identify documents that not only are relevant to the query but also match the student's reading level. Manually obtaining an estimate of reading difficulty for each document is not feasible for

A maximum entropy approach to adaptive s
✍ Ronald Rosenfeld πŸ“‚ Article πŸ“… 1996 πŸ› Elsevier Science 🌐 English βš– 320 KB

An adaptive statistical language model is described, which successfully integrates long distance linguistic information with other knowledge sources. Most existing statistical language models exploit only the immediate history of a text. To extract information from further back in the document's his

Minimal sufficient statistics for a gene
✍ AndrΓ©I. Khuri πŸ“‚ Article πŸ“… 1997 πŸ› Elsevier Science 🌐 English βš– 303 KB

Khuri and Ghosh (1990) used a certain technique to obtain minimal sufficient statistics for the unbalanced random two-fold nested model. The present article provides an extension of this technique to a general class of unbalanced models with fixed and random effects.