## Abstract In the comprehension of textual data, it is critical for people to perceive relationships between topics. This work explores two approaches that use text categorizations to reveal underlying relationships: the Overlap approach, which visualizes overlaps between categories, and the Searc
Information content in textual data: Revisited for Arabic text
โ Scribed by Hegazi, Nadia ;Ali, Nabil ;Abed, Ehsan
- Publisher
- John Wiley and Sons
- Year
- 1987
- Tongue
- English
- Weight
- 322 KB
- Volume
- 38
- Category
- Article
- ISSN
- 0002-8231
No coin nor oath required. For personal study only.
โฆ Synopsis
Arabic as opposed to English is a highly redundant language due to its morphological nature.
A study was done to measure this redundancy and compare it to its respective values in English. Samples of books, news papers, and social magazines were used to measure the entropy of the Arabic language using the n-gram method generated from a moving window of eight characters. Studies of the dependencies of characters on each other was done, as well as a study on the average distribution of word lengths. The results obtained indicated the ability of Arabic to be more compressible than English, and that of course is due to its morphological nature. The average length of Arabic words was found to be longer than English words due to the fact that Arabic words contain morphological extensions.
๐ SIMILAR VOLUMES