Computer aided lexicography
✍ Scribed by Ivan Sklenář; Václav Kříž
- Book ID
- 103045996
- Publisher
- Elsevier Science
- Year
- 1990
- Tongue
- English
- Weight
- 255 KB
- Volume
- 61
- Category
- Article
- ISSN
- 0010-4655
No coin nor oath required. For personal study only.
✦ Synopsis
Programs with a natural-language user interface and text-processing programs require a vocabulary providing the mapping of the individual word form onto a lexeme, e.g. "says", "said", "saying" -~"see". Examples of such programs are indexing programs for information retrieval, and spelling correctors for text-processing systems.
The lexicographical task of such a computer vocabulary is especially difficult for Slavic languages, because their morphological structure is complex. An average Czech verb, for example, has 25 forms, and we have identified more than 100 paradigms for verbs.
In order to support the creation of a Czech vocabulary, we have designed a system of programs for paradigm identification and derivation of words. The result of our effort is a vocabulary comprising 110000 words and 1250000 word forms. This vocabulary was used for the PASSAT system in the Czechoslovak Press Agency. This vocabulary may also be used in a spelling corrector. However, for such an application the vocabulary must be compressed into a compact form in order to shorten the access times. Compression is based on the paradigmatic structure of morphology which defmes suffix sets for each word stem and a set of stems for each word.
📜 SIMILAR VOLUMES