## Abstract In this article, a wordโoriented approximate string matching approach for searching Arabic text is presented. The distance between a pair of words is determined on the basis of aligning the two words by using occurrence heuristic tables. Two words are considered related if they have the
Using N-grams for Arabic text searching
โ Scribed by Suleiman H. Mustafa; Qasem A. Al-Radaideh
- Publisher
- John Wiley and Sons
- Year
- 2004
- Tongue
- English
- Weight
- 231 KB
- Volume
- 55
- Category
- Article
- ISSN
- 1532-2882
No coin nor oath required. For personal study only.
โฆ Synopsis
Abstract
Nโgrams have been widely investigated for a number of text processing and retrieval applications. This article examines the performance of the digram and trigram term conflation techniques in the context of Arabic free text retrieval. It reports the results of using the Nโgram approach for a corpus of thousands of distinct textual words drawn from a number of sources representing various disciplines. The results indicate that the digram method offers a better performance than trigram with respect to conflation precision and conflation recall ratios. In either case, the Nโgram approach does not appear to provide an efficient conflation approach due to the peculiarities imposed by the Arabic infix structure that reduces the rate of correct Nโgram matching.
๐ SIMILAR VOLUMES
A method of full-text scanning for matches in a large dictionary of keywords is described, suitable for Selective Dissemination of Information (SDI). The method is applicable to large dictionaries (say 10 4 to 10 5 entries), and to arbitrary byte streams for both patterns and data samples. The appro