✦ LIBER ✦

Statistical language modeling with a class-basedn-multigram model

✍ Scribed by Sabine Deligne; Yoshinori Sagisaka

Book ID: 102566850
Publisher: Elsevier Science
Year: 2000
Tongue: English
Weight: 166 KB
Volume: 14
Category: Article
ISSN: 0885-2308
DOI: 10.1006/csla.2000.0146

No coin nor oath required. For personal study only.

✦ Synopsis

In this paper, we present a stochastic language-modeling tool which aims at retrieving variable-length phrases (multigrams), assuming n-gram dependencies between them, hence the name of the model: n-multigram. The estimation of the probability distribution of the phrases is intermixed with a phrase-clustering procedure in a way which jointly optimizes the likelihood of the data. As a result, the language data are iteratively structured at both a paradigmatic and a syntagmatic level in a fully integrated way. We evaluate the 2-multigram model as a statistical language model on ATIS, a task-oriented database consisting of air travel reservations. Experiments show that the 2-multigram model allows a reduction of 10% of the word error rate on ATIS with respect to the usual trigram model, with 25% fewer parameters than in the trigram model. In addition, we illustrate the ability of this model to merge semantically related phrases of different lengths into a common class.

📜 SIMILAR VOLUMES

Predicting reading difficulty with stati

Predicting reading difficulty with statistical language models

✍ Kevyn Collins-Thompson; Jamie Callan 📂 Article 📅 2005 🏛 John Wiley and Sons 🌐 English ⚖ 373 KB

## Abstract A potentially useful feature of information retrieval systems for students is the ability to identify documents that not only are relevant to the query but also match the student's reading level. Manually obtaining an estimate of reading difficulty for each document is not feasible for

Augmenting Naive Bayes Classifiers with

Augmenting Naive Bayes Classifiers with Statistical Language Models

✍ Fuchun Peng; Dale Schuurmans; Shaojun Wang 📂 Article 📅 2004 🏛 Springer Netherlands 🌐 English ⚖ 153 KB

Dynamic Web log session identification w

Dynamic Web log session identification with statistical language models

✍ Xiangji Huang; Fuchun Peng; Aijun An; Dale Schuurmans 📂 Article 📅 2004 🏛 John Wiley and Sons 🌐 English ⚖ 687 KB

A maximum entropy approach to adaptive s

A maximum entropy approach to adaptive statistical language modelling

✍ Ronald Rosenfeld 📂 Article 📅 1996 🏛 Elsevier Science 🌐 English ⚖ 320 KB

An adaptive statistical language model is described, which successfully integrates long distance linguistic information with other knowledge sources. Most existing statistical language models exploit only the immediate history of a text. To extract information from further back in the document's his

On a class of exactly soluble statistica

On a class of exactly soluble statistical mechanical models with nonpolynomial interactions

✍ J. G. Brankov; N. S. Tonchev; V. A. Zagrebnov 📂 Article 📅 1979 🏛 Springer 🌐 English ⚖ 624 KB

Minimal sufficient statistics for a gene

Minimal sufficient statistics for a general class of mixed models

✍ AndréI. Khuri 📂 Article 📅 1997 🏛 Elsevier Science 🌐 English ⚖ 303 KB

Khuri and Ghosh (1990) used a certain technique to obtain minimal sufficient statistics for the unbalanced random two-fold nested model. The present article provides an extension of this technique to a general class of unbalanced models with fixed and random effects.