๐”– Bobbio Scriptorium
โœฆ   LIBER   โœฆ

A model for word clustering

โœ Scribed by Thom, James A. ;Zobel, Justin


Publisher
John Wiley and Sons
Year
1992
Tongue
English
Weight
973 KB
Volume
43
Category
Article
ISSN
0002-8231

No coin nor oath required. For personal study only.

โœฆ Synopsis


It is common to model the distribution of words in text by measures such as the Poisson approximation.

However, these measures ignore effects such as clustering: our analysis of document collections demonstrates that the Poisson approximation can significantly overestimate the probability that a document contains a word. Based on our analysis, we propose a new model for distribution of words in text, and show how this model can be used to estimate the probability that a document contains a word and the number of distinct words in a document.


๐Ÿ“œ SIMILAR VOLUMES


A model for word clustering
โœ Harter, Stephen P. ๐Ÿ“‚ Article ๐Ÿ“… 1993 ๐Ÿ› John Wiley and Sons ๐ŸŒ English โš– 90 KB ๐Ÿ‘ 3 views

pp. 459-476) . Further evidence in support of their position can be found in the doctoral dissertation from Columbia University by Dr. Farideh Tehrani (Rutgers University Library) published as Negligence and Chaos: Bibliographical Access to Persian-Language Materials in the United States (Metuchen,

PARSER: A Model for Word Segmentation
โœ Pierre Perruchet; Annie Vinter ๐Ÿ“‚ Article ๐Ÿ“… 1998 ๐Ÿ› Elsevier Science ๐ŸŒ English โš– 128 KB
A clustering algorithm for fuzzy model i
โœ Jian-Qin Chen; Yu-Geng Xi; Zhong-Jun Zhang ๐Ÿ“‚ Article ๐Ÿ“… 1998 ๐Ÿ› Elsevier Science ๐ŸŒ English โš– 741 KB

The fuzzy model proposed by Takagi and Sugeno can represent highly nonlinear systems and is widely used for the representation of fuzzy rules. In this paper, the model is firstly modified to make its identification easier. Based on the fuzzy c-partition space, four criteria are proposed for optimiza

A parameter-free clustering model
โœ Israel Gitman ๐Ÿ“‚ Article ๐Ÿ“… 1972 ๐Ÿ› Elsevier Science ๐ŸŒ English โš– 427 KB
A language model based on semantically c
โœ Hsi-Jian Lee; Cheng-Huang Tung ๐Ÿ“‚ Article ๐Ÿ“… 1997 ๐Ÿ› Elsevier Science ๐ŸŒ English โš– 656 KB

Tttis paper presents a new method for clustering the words in a dictionary into word groups. A Chinese character recognition system can then use these groups in a language model to improve the recognition accuracy. In the language model, the number of parameters we must train beforehand can be kept