pp. 459-476) . Further evidence in support of their position can be found in the doctoral dissertation from Columbia University by Dr. Farideh Tehrani (Rutgers University Library) published as Negligence and Chaos: Bibliographical Access to Persian-Language Materials in the United States (Metuchen,
A model for word clustering
โ Scribed by Thom, James A. ;Zobel, Justin
- Publisher
- John Wiley and Sons
- Year
- 1992
- Tongue
- English
- Weight
- 973 KB
- Volume
- 43
- Category
- Article
- ISSN
- 0002-8231
No coin nor oath required. For personal study only.
โฆ Synopsis
It is common to model the distribution of words in text by measures such as the Poisson approximation.
However, these measures ignore effects such as clustering: our analysis of document collections demonstrates that the Poisson approximation can significantly overestimate the probability that a document contains a word. Based on our analysis, we propose a new model for distribution of words in text, and show how this model can be used to estimate the probability that a document contains a word and the number of distinct words in a document.
๐ SIMILAR VOLUMES
The fuzzy model proposed by Takagi and Sugeno can represent highly nonlinear systems and is widely used for the representation of fuzzy rules. In this paper, the model is firstly modified to make its identification easier. Based on the fuzzy c-partition space, four criteria are proposed for optimiza
Tttis paper presents a new method for clustering the words in a dictionary into word groups. A Chinese character recognition system can then use these groups in a language model to improve the recognition accuracy. In the language model, the number of parameters we must train beforehand can be kept