Machine Learning in Translation Corpora Processing

✍ Scribed by Wołk, Krzysztof

Publisher: Chapman and Hall/CRC
Year: 2019
Tongue: English
Leaves: 281
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

This book reviews ways to improve statistical machine speech translation between Polish and English. Research has been conducted mostly on dictionary-based, rule-based, and syntax-based, machine translation techniques. Most popular methodologies and tools are not well-suited for the Polish language and therefore require adaptation, and language resources are lacking in parallel and monolingual data. The main Read more...

Abstract: This book reviews ways to improve statistical machine speech translation between Polish and English. Research has been conducted mostly on dictionary-based, rule-based, and syntax-based, machine translation techniques. Most popular methodologies and tools are not well-suited for the Polish language and therefore require adaptation, and language resources are lacking in parallel and monolingual data. The main objective of this volume to develop an automatic and robust Polish-to-English translation system to meet specific translation requirements and to develop bilingual textual resources by mining comparable corpora

✦ Table of Contents

Content: Cover
Title Page
Copyright Page
Acknowledgements
Preface
Table of Contents
Abbreviations and Definitions
Overview
1: Introduction
1.1 Background and context
1.1.1 The concept of cohesion
1.2 Machine translation (MT)
1.2.1 History of statistical machine translation (SMT)
1.2.2 Statistical machine translation approach
1.2.3 SMT applications and research trends
2: Statistical Machine Translation and Comparable Corpora
2.1 Overview of SMT
2.2 Textual components and corpora
2.2.1 Words
2.2.2 Sentences
2.2.3 Corpora
2.3 Moses tool environment for SMT
2.3.1 Tuning for quality 2.3.2 Operation sequence model (OSM)2.3.3 Minimum error rate training tool
2.4 Aspects of SMT processing
2.4.1 Tokenization
2.4.2 Compounding
2.4.3 Language models
2.4.3.1 Out of vocabulary words
2.4.3.2 N-gram smoothing methods
2.4.4 Translation models
2.4.4.1 Noisy channel model
2.4.4.2 IBM models
2.4.4.3 Phrase-based models
2.4.5 Lexicalized reordering
2.4.5.1 Word alignment
2.4.6 Domain text adaptation
2.4.6.1 Interpolation
2.4.6.2 Adaptation of parallel corpora
2.5 Evaluation of SMT quality
2.5.1 Current evaluation metrics
2.5.1.1 BLEU overview 2.5.1.2 Other SMT metrics2.5.1.3 HMEANT metric
2.5.1.3.1 Evaluation using HMEANT
2.5.1.3.2 HMEANT calculation
2.5.2 Statistical significance test
3: State of the Art
3.1 Current methods and results in spoken language translation
3.2 Recent methods in comparable corpora exploration
3.2.1 Native Yalign method
3.2.2 A* algorithm for alignment
3.2.3 Needleman-Wunsch algorithm
3.2.4 Other alignment methods
4: Author's Solutions to PL-EN Corpora Processing Problems
4.1 Parallel data mining improvements
4.2 Multi-threaded, tuned and GPU-accelerated Yalign 4.2.1 Needleman-Wunsch algorithm with GPU optimization4.2.2 Comparison of alignment methods
4.3 Tuning of Yalign method
4.4 Minor improvements in mining for Wikipedia exploration
4.5 Parallel data mining using other methods
4.5.1 The pipeline of tools
4.5.2 Analogy-based method
4.6 SMT metric enhancements
4.6.1 Enhancements to the BLEU metric
4.6.2 Evaluation using enhanced BLEU metric
4.7 Alignment and filtering of corpora
4.7.1 Corpora used for alignment experiments
4.7.2 Filtering and alignment algorithm
4.7.3 Filtering results
4.7.4 Alignment evaluation results 4.8 Baseline system training4.9 Description of experiments
4.9.1 Text alignment processing
4.9.2 Machine translation experiments
4.9.2.1 TED lectures translation
4.9.2.1.1 Word stems and SVO word order
4.9.2.1.2 Lemmatization
4.9.2.1.3 Translation and translation parameter adaptation experiments
4.9.2.2 Subtitles and EuroParl translation
4.9.2.3 Medical texts translation
4.9.2.4 Pruning experiments
4.9.3 Evaluation of obtained comparable corpora
4.9.3.1 Native Yalign method
4.9.3.2 Improved Yalign method
4.9.3.3 Parallel data mining using tool pipeline

✦ Subjects

Polish language;Machine translating;English language;Machine translating;Machine translating;COMPUTERS;General;COMPUTERS;Machine Theory;MATHEMATICS;Arithmetic

📜 SIMILAR VOLUMES

Learning machine translation

📁 Learning machine translation

✍ Cyril Goutte; et al 📂 Library 📅 2009 🏛 MIT Press 🌐 English

Learning machine translation

📁 Learning machine translation

✍ Cyril Goutte, Nicola Cancedda, Marc Dymetman, George Foster 📂 Library 📅 2009 🏛 MIT Press 🌐 English

Advances in Machine Learning and Signal

📁 Advances in Machine Learning and Signal Processing: Proceedings of MALSIP 2015

✍ Ping Jack Soh, Wai Lok Woo, Hamzah Asyrani Sulaiman, Mohd Azlishah Othman, Mohd 📂 Library 📅 2016 🏛 Springer International Publishing 🌐 English

This book presents important research findings and recent innovations in the field of machine learning and signal processing. A wide range of topics relating to machine learning and signal processing techniques and their applications are addressed in order to provide both researchers and practiti

Machine Translation and Foreign Language

📁 Machine Translation and Foreign Language Learning (New Frontiers in Translation Studies)

✍ Kizito Tekwa 📂 Library 📅 2024 🏛 Springer 🌐 English

The book investigates how machine translation (MT) provides opportunities and increases the willingness to communicate in a foreign language. It is informed by a mixed methods methodological approach that analyzes quantitative and qualitative data of questionnaires and real-time instant mes

Machine Learning and Deep Learning in Na

📁 Machine Learning and Deep Learning in Natural Language Processing

✍ Anitha S. Pillai and Roberto Tedesco 📂 Library 📅 2024 🏛 CRC Press 🌐 English

Natural Language Processing (NLP) is a sub-field of Artificial Intelligence, linguistics, and computer science and is concerned with the generation, recognition, and understanding of human languages, both written and spoken. NLP systems examine the grammatical structure of sentences as well as the s

Coherence: In Signal Processing and Mach

📁 Coherence: In Signal Processing and Machine Learning

✍ David Ramírez, Ignacio Santamaría, Louis Scharf 📂 Library 📅 2023 🏛 Springer 🌐 English

This book organizes principles and methods of signal processing and machine learning into the framework of coherence. The book contains a wealth of classical and modern methods of inference, some reported here for the first time. General results are applied to problems in communications, co