The Internet and the World Wide Web have become an integral part of everyday life, an important source of information and a communication medium. One of the main problems confronting non-English speakers in using the Internet is that it is heavily dominated by the English language. Knowledge of Engl
An image-based automatic Arabic translation system
β Scribed by Yi Chang; Datong Chen; Ying Zhang; Jie Yang
- Publisher
- Elsevier Science
- Year
- 2009
- Tongue
- English
- Weight
- 377 KB
- Volume
- 42
- Category
- Article
- ISSN
- 0031-3203
No coin nor oath required. For personal study only.
β¦ Synopsis
In this paper, we present a system that automatically translates Arabic text embedded in images into English. The system consists of three components: text detection from images, character recognition, and machine translation. We formulate the text detection as a binary classification problem and apply gradient boosting tree (GBT), support vector machine (SVM), and location-based prior knowledge to improve the F1 score of text detection from 78.95% to 87.05%. The detected text images are processed by off-the-shelf optical character recognition (OCR) software. We employ an error correction model to post-process the noisy OCR output, and apply a bigram language model to reduce word segmentation errors. The translation module is tailored with compact data structure for hand-held devices. The experimental results show substantial improvements in both word recognition accuracy and translation quality. For instance, in the experiment of Arabic transparent font, the BLEU score increases from 18.70 to 33.47 with use of the error correction module.
π SIMILAR VOLUMES
An automatic visual inspection system designed for dirt inspection in the pulp and paper industry is presented. A new hierarchical region oriented segmentation algorithm is introduced. The algorithm is tuned according to the singular characteristics of the pulp samples. A criterion based on the maxi
## Abstract ## Purpose To demonstrate a robust registration method of brain magnetic resonance (MR) images based on the Talairach reference system with automatic determinations of the fiducial points. ## Materials and Methods Eight specified landmark points of the Talairach reference system are
The illustrations in biomedical publications often provide useful information in aiding clinicians' decisions when full text searching is performed to find evidence in support of a clinical decision. In this research, image analysis and classification techniques are explored to automatically extract