๐”– Bobbio Scriptorium
โœฆ   LIBER   โœฆ

Stemming of French words based on grammatical categories

โœ Scribed by Savoy, Jacques


Publisher
John Wiley and Sons
Year
1993
Tongue
English
Weight
890 KB
Volume
44
Category
Article
ISSN
0002-8231

No coin nor oath required. For personal study only.

โœฆ Synopsis


Automatic

indexing systems use suffix stripping algorithms to cluster various words derived from a common root under the same stem. Currently, removing affixes to either a context-free or context-sensitive operation, where the context refers to the remaining stem. In this article, we propose a suffixing algorithm which uses grammatical categories to enhance the stemming process. This approach supports the use of foreign languages. In our case, the language is French, and a morphological analysis is required for removing inflectional suffixes or morphosyntactic variants of a lemma. After this analysis, we implement a suffix stripping algorithm which uses a dictionary and the grammatical categories to remove derivational suffixes. Our approach always returns a linguistically correct lemma, but not necessarily the "right" one. Based on our tests, this solution is an attractive one, with a mean error rate of 16%. We finish by explaining why we cannot expect significantly better results with this approach.


๐Ÿ“œ SIMILAR VOLUMES


Image estimation of words based on adjec
โœ Kouhei Shimizu; Masafumi Hagiwara ๐Ÿ“‚ Article ๐Ÿ“… 2007 ๐Ÿ› John Wiley and Sons ๐ŸŒ English โš– 694 KB

## Abstract In natural language, words convey various impressions such as โ€œ__kurai__ (dark)โ€__akarui__ (bright)โ€ or โ€œ__kitanai__ (dirty)โ€__utsukushii__ (beautiful).โ€ Impressions play an important role in inferring the speaker's intentions or feelings. Systems that use natural language to support co

General study of the distribution of N-t
โœ L Egghe ๐Ÿ“‚ Article ๐Ÿ“… 2000 ๐Ÿ› Elsevier Science ๐ŸŒ English โš– 508 KB

## This paper establishes the general relation between the distribution of N-tuples of letters (e.g., N-truncations, N-grams) or words (e.g., N-word phrases) and the distributions of the single letters or words. Here the very general case is treated: the case where there is dependence on the place