✦ LIBER ✦

A model for word clustering

✍ Scribed by Thom, James A. ;Zobel, Justin

Publisher: John Wiley and Sons
Year: 1992
Tongue: English
Weight: 973 KB
Volume: 43
Category: Article
ISSN: 0002-8231
DOI: 10.1002/(sici)1097-4571(199210)43:9<616::aid-asi4>3.0.co;2-a

No coin nor oath required. For personal study only.

✦ Synopsis

It is common to model the distribution of words in text by measures such as the Poisson approximation.

However, these measures ignore effects such as clustering: our analysis of document collections demonstrates that the Poisson approximation can significantly overestimate the probability that a document contains a word. Based on our analysis, we propose a new model for distribution of words in text, and show how this model can be used to estimate the probability that a document contains a word and the number of distinct words in a document.

📜 SIMILAR VOLUMES

A model for word clustering

✍ Harter, Stephen P. 📂 Article 📅 1993 🏛 John Wiley and Sons 🌐 English ⚖ 90 KB 👁 3 views

pp. 459-476) . Further evidence in support of their position can be found in the doctoral dissertation from Columbia University by Dr. Farideh Tehrani (Rutgers University Library) published as Negligence and Chaos: Bibliographical Access to Persian-Language Materials in the United States (Metuchen,

PARSER: A Model for Word Segmentation

✍ Pierre Perruchet; Annie Vinter 📂 Article 📅 1998 🏛 Elsevier Science 🌐 English ⚖ 128 KB

A clustering algorithm for fuzzy model i

A clustering algorithm for fuzzy model identification

✍ Jian-Qin Chen; Yu-Geng Xi; Zhong-Jun Zhang 📂 Article 📅 1998 🏛 Elsevier Science 🌐 English ⚖ 741 KB

The fuzzy model proposed by Takagi and Sugeno can represent highly nonlinear systems and is widely used for the representation of fuzzy rules. In this paper, the model is firstly modified to make its identification easier. Based on the fuzzy c-partition space, four criteria are proposed for optimiza

Long distance bigram models applied to w

Long distance bigram models applied to word clustering

✍ Nikoletta Bassiou; Constantine Kotropoulos 📂 Article 📅 2011 🏛 Elsevier Science 🌐 English ⚖ 305 KB

A parameter-free clustering model

✍ Israel Gitman 📂 Article 📅 1972 🏛 Elsevier Science 🌐 English ⚖ 427 KB

A language model based on semantically c

A language model based on semantically clustered words in a Chinese character recognition system

✍ Hsi-Jian Lee; Cheng-Huang Tung 📂 Article 📅 1997 🏛 Elsevier Science 🌐 English ⚖ 656 KB

Tttis paper presents a new method for clustering the words in a dictionary into word groups. A Chinese character recognition system can then use these groups in a language model to improve the recognition accuracy. In the language model, the number of parameters we must train beforehand can be kept