๐”– Bobbio Scriptorium
โœฆ   LIBER   โœฆ

Information content in textual data: Revisited for Arabic text

โœ Scribed by Hegazi, Nadia ;Ali, Nabil ;Abed, Ehsan


Publisher
John Wiley and Sons
Year
1987
Tongue
English
Weight
322 KB
Volume
38
Category
Article
ISSN
0002-8231

No coin nor oath required. For personal study only.

โœฆ Synopsis


Arabic as opposed to English is a highly redundant language due to its morphological nature.

A study was done to measure this redundancy and compare it to its respective values in English. Samples of books, news papers, and social magazines were used to measure the entropy of the Arabic language using the n-gram method generated from a moving window of eight characters. Studies of the dependencies of characters on each other was done, as well as a study on the average distribution of word lengths. The results obtained indicated the ability of Arabic to be more compressible than English, and that of course is due to its morphological nature. The average length of Arabic words was found to be longer than English words due to the fact that Arabic words contain morphological extensions.


๐Ÿ“œ SIMILAR VOLUMES


Using Category Information for Relations
โœ Yan Qu; George Furnas; Ben Walstrum ๐Ÿ“‚ Article ๐Ÿ“… 2007 ๐Ÿ› Wiley (John Wiley & Sons) ๐ŸŒ English โš– 147 KB

## Abstract In the comprehension of textual data, it is critical for people to perceive relationships between topics. This work explores two approaches that use text categorizations to reveal underlying relationships: the Overlap approach, which visualizes overlaps between categories, and the Searc