A system of bit map inverted files has been implemented on an interactive interpreter computer system (MUMPS-PC). This has made search performance previously available only in machine language available to the high-level language programmer. The superior flexibility of the technique over conventiona
A bit of progress in language modeling
β Scribed by Joshua T. Goodman
- Publisher
- Elsevier Science
- Year
- 2001
- Tongue
- English
- Weight
- 356 KB
- Volume
- 15
- Category
- Article
- ISSN
- 0885-2308
No coin nor oath required. For personal study only.
β¦ Synopsis
In the past several years, a number of different language modeling improvements over simple trigram models have been found, including caching, higher-order n-grams, skipping, interpolated Kneser-Ney smoothing, and clustering. We present explorations of variations on, or of the limits of, each of these techniques, including showing that sentence mixture models may have more potential. While all of these techniques have been studied separately, they have rarely been studied in combination. We compare a combination of all techniques together to a Katz smoothed trigram model with no count cutoffs. We achieve perplexity reductions between 38 and 50% (1 bit of entropy), depending on training data size, as well as a word error rate reduction of 8.9%. Our perplexity reductions are perhaps the highest reported compared to a fair baseline.
π SIMILAR VOLUMES
## Abstract ChemInform is a weekly Abstracting Service, delivering concise information at a glance that was extracted from about 100 leading journals. To access a ChemInform Abstract of an article which was published elsewhere, please select a βFull Textβ option. The original article is trackable v
The simulation of self-reproduction in systems of automata and biological cells is a promising technique for the study of biological self-organization. A special-purpose compiler (Cellular List-Processing Program, CLPP) has been written in SDS-920 machine language and used for models of this type.
An implemented model of language processing has been developed that views the propositional components of a sentence as neural units. The propositional sentence units are linked through symbolic, reified representations of subordinate sentence parts. Large numbers of these highly standardized propos