In the past several years, a number of different language modeling improvements over simple trigram models have been found, including caching, higher-order n-grams, skipping, interpolated Kneser-Ney smoothing, and clustering. We present explorations of variations on, or of the limits of, each of the
A bit progress on word-based language model
โ Scribed by Yong Chen; Guo-Ping Chen
- Book ID
- 107482132
- Publisher
- Chinese Electronic Periodical Services
- Year
- 2003
- Tongue
- English
- Weight
- 645 KB
- Volume
- 7
- Category
- Article
- ISSN
- 1007-6417
No coin nor oath required. For personal study only.
๐ SIMILAR VOLUMES
Tttis paper presents a new method for clustering the words in a dictionary into word groups. A Chinese character recognition system can then use these groups in a language model to improve the recognition accuracy. In the language model, the number of parameters we must train beforehand can be kept
Experimental results show that a word-based arithmetic coding scheme can achieve a higher compression performance for Chinese text. However, an arithmetic coding scheme is a fractional-bit compression algorithm which is known to be time consuming. In this article, we change the direction to study ho