English-Wolaytta machine translation system using statistical approach

✍ Scribed by Melaku Mara

Year: 2018
Tongue: English
Leaves: 60
Category: Library

No coin nor oath required. For personal study only.

✦ Table of Contents

Abstract
Machine translation is a technology for the automatic translation of text or speech from one natural language to another. Since there is a need for translation of sentences between English-Wolaytta language to make available the English documents in Wolaytta language and minimize the language barrier. Thus, this study in the development of a English-Wolaytta machine translation system using statistical approach.
In order to achieve the objective of this research work, 30,000 bilingual corpus is collected from spiritual domain and 39,893 monolingual corpus from different sources. And also prepared in a format suitable for use in the development process (normalization, tokenization, lower-case and clean) and classified as training, tunning and testing set. Aligned parallel sentences manually and used freely available tools for the different purposes such as SRILM toolkit for language model, MGIZA++ align the corpus at word level by using IBM models (1-5), Decoding has been done using Moses, and Ubuntu operating system which is suitable for Moses environment has been used. In addition, unsupervised morpheme segmentation tool Morfessor is used for segmentation of Wolaytta text.
The experiments were taken separately, one for the unsegmented and the other for segmented corpus. The parallel sentences divided by 5,000, 10,000, 15,000, 20,000, 25,000 and 30,000. The unsegmented corpus performs BLEU score of 4.91%, 6.30%, 7.21%, 7.60%, 7.96% and 8.46% used the above divided parallel sentences. The segmented corpus performs BLEU score of 9.83%, 11.38%, 12.70%, 12.77%, 12.93% and 13.21% used the above divided parallel sentences. Its performance improved by increased the size of the corpus and segmented parallel sentences.
Base on the experiments done, the researcher observed that there will be a better performance when increase the size of the corpus and morphological segmentation. Therefore future research should focus to further improve the performance of the system increase the size of the corpus and morphological segmentation.
List of Tables
List of Figures
Acronyms and Abbreviations
CHAPTER ONE
1. INTRODUCTION
1.1. Introduction
1.2. Background
1.3. Statements of the Problem
1.4. Objectives of the Study
1.4.1. General Objective
1.4.2. Specific Objectives
1.5. Methodologies
1.5.1 Literature Review
1.5.2. Data Collection
1.5.3. Tools and Techniques
1.5.4. Evaluation
1.6. Scope and Limitations of the Study
1.6.1. Scope of the Study
1.6.2. Limitations of the Study
1.7. Contribution of the Study
1.8. Organization of the Thesis
CHAPTER TWO
2. WOLAYTTA LANGUAGE
2.1. Introduction
2.2. Overview of Wolaytta Language
2.3. Morphology
2.3.1. Morphological Analysis
2.3.2. Morphological Synthesis
2.4. Morphology of Wolaytta
2.4.1. Personal Pronouns
2.4.2. Subject Verb Agreement
2.4.3. Nouns
2.4.4. Gender
2.4.5. Number
2.5. Wolaytta Language Writing System
2.6. Wolaytta Language Sentence Structure
2.7. Articles
2.8. Punctuation Marks
2.9. Conjunctions
CHAPTER THREE
3. LITERATURE REVIEW
3.1. Introduction
3.2. Machine Translation (MT)
3.3. Approaches of Machine Translation
3.3.1. Statistical Machine Translation (SMT)
Language Modeling
Translation Modeling
Decoder
3.3.2. Rule Based Machine Translation (RBMT)
3.3.3. Example Based Machine Translation (EBMT)
3.3.4. Hybrid Machine Translation (HMT)
3.3.5. Neural Machine Translation (NMT)
3.4. Evaluation of Machine Translation
3.5. Related Works
3.5.1. English–Afaan Oromo Machine Translation: An Experiment Using Statistical Approach
3.5.2. Bidirectional English-Amharic Machine Translation: An Experiment using Constrained Corpus
3.5.3. Preliminary Experiments on English-Amharic Statistical Machine Translation (EASMT)
3.5.4. Bidirectional English–Afaan Oromo Machine Translation Using Hybrid Approach
3.5.5. English-Tigrigna Factored Statistical Machine Translation
3.5.6. Bidirectional Tigrigna-English Statistical Machine Translation
CHAPTER FOUR
4. DEVELOPMENT OF ENGLISH-WOLAYTTA SMT
4.1. Introduction
4.2. Architecture of the English-Wolaytta SMT
4.3. Corpus Collection and Preparation
4.3.1. Preliminary Preparation
4.3.2. Bilingual Corpus
4.3.3. Monolingual Corpus
4.3.4. Language Model
4.3.5. Translation Model
4.3.6. Decoding
4.4. Software’s
CHAPTER FIVE
5. EXPERIMENT
5.1. Introduction
5.2. Experiment
5.2.1. Experiment-I: Unsegmented Corpus Set
5.2.2. Experiment-II: Segmented Corpus Set
5.3. Discussion
CHAPTER SIX
6. CONCLUSION AND RECOMMENDATION
6.1. Conclusion
6.2. Recommendation
7. References

✦ Subjects

Wolaitta;Wolaytta;language;orthography

📜 SIMILAR VOLUMES

Statistical Machine Translation

📁 Statistical Machine Translation

✍ Philipp Koehn 📂 Library 🌐 English

Statistical machine translation : textbo

📁 Statistical machine translation : textbook

✍ Philipp Koehn 📂 Library 📅 2010 🏛 Univ. Pr 🌐 English

Preface; Part I. Foundations: 1. Introduction; 2. Words, sentences, corpora; 3. Probability theory; Part II. Core Methods: 4. Word-based models; 5. Phrase-based models; 6. Decoding; 7. Language models; 8. Evaluation; Part III. Advanced Topics: 9. Discriminative training; 10. Integrating linguistic

Machine translation systems

📁 Machine translation systems

✍ Jonathan Slocum (ed.) 📂 Library 📅 1988 🏛 Cambridge University Press 🌐 English

Machine translation systems

📁 Machine translation systems

✍ Slocum 📂 Library 📅 1988 🏛 Cambridge University Press 🌐 English

Out of this opportunity evolved a request for me to edit a special issue of the journal Computational Linguistics, to be devoted to MT and, the Editorial Board permitting, to include my COLING paper. So it was that in 1985, Issues 1-3 of Volume 11 presented a collection of papers on MT by most of th

Syntax-based Statistical Machine Transla

📁 Syntax-based Statistical Machine Translation

✍ Philip Williams, Rico Sennrich, Matt Post, Philipp Koehn 📂 Library 📅 2016 🏛 Morgan & Claypool 🌐 English

Hybrid Approaches to Machine Translation

📁 Hybrid Approaches to Machine Translation

✍ Babych, Bogdan; Banchs, Rafael E.; Costa-jussà, Marta R.; Eberle, Kurt; Lambert, 📂 Library 📅 2016 🌐 English

<p><p>This volume provides an overview of the field of Hybrid Machine Translation (MT) and presents some of the latest research conducted by linguists and practitioners from different multidisciplinary areas. Nowadays, most important developments in MT are achieved by combining data-driven and rule-