𝔖 Bobbio Scriptorium
✦   LIBER   ✦

Design and implementation of automatic indexing for information retrieval with Arabic documents

✍ Scribed by Hmeidi, Ismail ;Kanaan, Ghassan ;Evens, Martha


Publisher
John Wiley and Sons
Year
1997
Tongue
English
Weight
238 KB
Volume
48
Category
Article
ISSN
0002-8231

No coin nor oath required. For personal study only.

✦ Synopsis


man, 1993). Arabic provides a very different context

National Conferences as a source. All these abstracts from English, since it is a non-Indo-European language involve computer science and information systems. We with a complex morphological structure. also designed and built an automatic information re-Investigation of methods of automatic information retrieval system from scratch to handle Arabic data. The system was implemented in the C language using the GCC trieval for Arabic is essential to the growth of learning compiler and runs on IBM/PCs and compatible microcomin the Arab world. Expansion of information retrieval puters. We have implemented both automatic and manual systems is the simplest and most cost-effective way to indexing techniques for this corpus. A long series of experimake the resources of large reference libraries available ments using measures of recall and precision has demonto the increasing numbers of students and researchers in strated that automatic indexing is at least as effective as manual indexing and more effective in some cases. Since the Arab world. automatic indexing is both cheaper and faster, our results suggest that we can achieve a wider coverage of the literature with less money and produce as good results as with 1.1. Automatic Indexing manual indexing. We have also compared the retrieval results using words as index terms versus stems and roots,

In the United States the large bibliographic database and confirmed the results obtained by Al-Kharashi and Abu-Salem with smaller corpora that root indexing is more maintained by the National Library of Medicine is ineffective than word indexing.

* To whom all correspondence should be addressed.

than our best programs can now interpret them, they also make many mistakes in handling the huge numbers of