๐”– Bobbio Scriptorium
โœฆ   LIBER   โœฆ

Using N-grams for Arabic text searching

โœ Scribed by Suleiman H. Mustafa; Qasem A. Al-Radaideh


Publisher
John Wiley and Sons
Year
2004
Tongue
English
Weight
231 KB
Volume
55
Category
Article
ISSN
1532-2882

No coin nor oath required. For personal study only.

โœฆ Synopsis


Abstract

Nโ€grams have been widely investigated for a number of text processing and retrieval applications. This article examines the performance of the digram and trigram term conflation techniques in the context of Arabic free text retrieval. It reports the results of using the Nโ€gram approach for a corpus of thousands of distinct textual words drawn from a number of sources representing various disciplines. The results indicate that the digram method offers a better performance than trigram with respect to conflation precision and conflation recall ratios. In either case, the Nโ€gram approach does not appear to provide an efficient conflation approach due to the peculiarities imposed by the Arabic infix structure that reduces the rate of correct Nโ€gram matching.


๐Ÿ“œ SIMILAR VOLUMES


Word-oriented approximate string matchin
โœ Suleiman H. Mustafa ๐Ÿ“‚ Article ๐Ÿ“… 2005 ๐Ÿ› John Wiley and Sons ๐ŸŒ English โš– 143 KB

## Abstract In this article, a wordโ€oriented approximate string matching approach for searching Arabic text is presented. The distance between a pair of words is determined on the basis of aligning the two words by using occurrence heuristic tables. Two words are considered related if they have the

An n-gram hash and skip algorithm for fi
โœ Jonathan D. Cohen ๐Ÿ“‚ Article ๐Ÿ“… 1998 ๐Ÿ› John Wiley and Sons ๐ŸŒ English โš– 407 KB

A method of full-text scanning for matches in a large dictionary of keywords is described, suitable for Selective Dissemination of Information (SDI). The method is applicable to large dictionaries (say 10 4 to 10 5 entries), and to arbitrary byte streams for both patterns and data samples. The appro