Probabilistic word classification based on a context-sensitive binary tree method
โ Scribed by Jun Gao; XiXian Chen
- Book ID
- 102565950
- Publisher
- Elsevier Science
- Year
- 1997
- Tongue
- English
- Weight
- 235 KB
- Volume
- 11
- Category
- Article
- ISSN
- 0885-2308
No coin nor oath required. For personal study only.
โฆ Synopsis
A corpus-based statistical-oriented Chinese word classification can be regarded as a fundamental step for automatic or non-automatic, monolingual natural processing systems. Word classification can solve the problems of data sparseness and have far fewer parameters. So far, much relative work about word classification has been done. All the work is based on some similarity metrics. We use average mutual information as the global similarity metric to do classification. The clustering process is top-down splitting and the binary tree is growing with splitting. In natural language, the effect of left neighbours and right neighbours of a word are asymmetric. To utilize this directional information, we induce the left-right binary and the right-left binary tree to represent this property. The probability is also introduced in our algorithm to merge the resulting classes from the left-right and the right-left binary tree. Also, we use the resulting classes to do experiments on a word class-based language model. Some classes' results and the perplexity of a word class-based language model are presented.
๐ SIMILAR VOLUMES