𝔖 Scriptorium
✦   LIBER   ✦

πŸ“

Machine Learning in Translation Corpora Processing

✍ Scribed by WoΕ‚k, Krzysztof


Publisher
Chapman and Hall/CRC
Year
2019
Tongue
English
Leaves
281
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis


This book reviews ways to improve statistical machine speech translation between Polish and English. Research has been conducted mostly on dictionary-based, rule-based, and syntax-based, machine translation techniques. Most popular methodologies and tools are not well-suited for the Polish language and therefore require adaptation, and language resources are lacking in parallel and monolingual data. The main Β Read more...


Abstract: This book reviews ways to improve statistical machine speech translation between Polish and English. Research has been conducted mostly on dictionary-based, rule-based, and syntax-based, machine translation techniques. Most popular methodologies and tools are not well-suited for the Polish language and therefore require adaptation, and language resources are lacking in parallel and monolingual data. The main objective of this volume to develop an automatic and robust Polish-to-English translation system to meet specific translation requirements and to develop bilingual textual resources by mining comparable corpora

✦ Table of Contents


Content: Cover
Title Page
Copyright Page
Acknowledgements
Preface
Table of Contents
Abbreviations and Definitions
Overview
1: Introduction
1.1 Background and context
1.1.1 The concept of cohesion
1.2 Machine translation (MT)
1.2.1 History of statistical machine translation (SMT)
1.2.2 Statistical machine translation approach
1.2.3 SMT applications and research trends
2: Statistical Machine Translation and Comparable Corpora
2.1 Overview of SMT
2.2 Textual components and corpora
2.2.1 Words
2.2.2 Sentences
2.2.3 Corpora
2.3 Moses tool environment for SMT
2.3.1 Tuning for quality 2.3.2 Operation sequence model (OSM)2.3.3 Minimum error rate training tool
2.4 Aspects of SMT processing
2.4.1 Tokenization
2.4.2 Compounding
2.4.3 Language models
2.4.3.1 Out of vocabulary words
2.4.3.2 N-gram smoothing methods
2.4.4 Translation models
2.4.4.1 Noisy channel model
2.4.4.2 IBM models
2.4.4.3 Phrase-based models
2.4.5 Lexicalized reordering
2.4.5.1 Word alignment
2.4.6 Domain text adaptation
2.4.6.1 Interpolation
2.4.6.2 Adaptation of parallel corpora
2.5 Evaluation of SMT quality
2.5.1 Current evaluation metrics
2.5.1.1 BLEU overview 2.5.1.2 Other SMT metrics2.5.1.3 HMEANT metric
2.5.1.3.1 Evaluation using HMEANT
2.5.1.3.2 HMEANT calculation
2.5.2 Statistical significance test
3: State of the Art
3.1 Current methods and results in spoken language translation
3.2 Recent methods in comparable corpora exploration
3.2.1 Native Yalign method
3.2.2 A* algorithm for alignment
3.2.3 Needleman-Wunsch algorithm
3.2.4 Other alignment methods
4: Author's Solutions to PL-EN Corpora Processing Problems
4.1 Parallel data mining improvements
4.2 Multi-threaded, tuned and GPU-accelerated Yalign 4.2.1 Needleman-Wunsch algorithm with GPU optimization4.2.2 Comparison of alignment methods
4.3 Tuning of Yalign method
4.4 Minor improvements in mining for Wikipedia exploration
4.5 Parallel data mining using other methods
4.5.1 The pipeline of tools
4.5.2 Analogy-based method
4.6 SMT metric enhancements
4.6.1 Enhancements to the BLEU metric
4.6.2 Evaluation using enhanced BLEU metric
4.7 Alignment and filtering of corpora
4.7.1 Corpora used for alignment experiments
4.7.2 Filtering and alignment algorithm
4.7.3 Filtering results
4.7.4 Alignment evaluation results 4.8 Baseline system training4.9 Description of experiments
4.9.1 Text alignment processing
4.9.2 Machine translation experiments
4.9.2.1 TED lectures translation
4.9.2.1.1 Word stems and SVO word order
4.9.2.1.2 Lemmatization
4.9.2.1.3 Translation and translation parameter adaptation experiments
4.9.2.2 Subtitles and EuroParl translation
4.9.2.3 Medical texts translation
4.9.2.4 Pruning experiments
4.9.3 Evaluation of obtained comparable corpora
4.9.3.1 Native Yalign method
4.9.3.2 Improved Yalign method
4.9.3.3 Parallel data mining using tool pipeline

✦ Subjects


Polish language;Machine translating;English language;Machine translating;Machine translating;COMPUTERS;General;COMPUTERS;Machine Theory;MATHEMATICS;Arithmetic


πŸ“œ SIMILAR VOLUMES


Learning machine translation
✍ Cyril Goutte, Nicola Cancedda, Marc Dymetman, George Foster πŸ“‚ Library πŸ“… 2009 πŸ› MIT Press 🌐 English
Advances in Machine Learning and Signal
✍ Ping Jack Soh, Wai Lok Woo, Hamzah Asyrani Sulaiman, Mohd Azlishah Othman, Mohd πŸ“‚ Library πŸ“… 2016 πŸ› Springer International Publishing 🌐 English

<p>This book presents important research findings and recent innovations in the field of machine learning and signal processing. A wide range of topics relating to machine learning and signal processing techniques and their applications are addressed in order to provide both researchers and practiti

Machine Translation and Foreign Language
✍ Kizito Tekwa πŸ“‚ Library πŸ“… 2024 πŸ› Springer 🌐 English

<p><span>The book investigates how machine translation (MT) provides opportunities and increases the willingness to communicate in a foreign language. It is informed by a mixed methods methodological approach that analyzes quantitative and qualitative data of questionnaires and real-time instant mes

Machine Learning and Deep Learning in Na
✍ Anitha S. Pillai and Roberto Tedesco πŸ“‚ Library πŸ“… 2024 πŸ› CRC Press 🌐 English

Natural Language Processing (NLP) is a sub-field of Artificial Intelligence, linguistics, and computer science and is concerned with the generation, recognition, and understanding of human languages, both written and spoken. NLP systems examine the grammatical structure of sentences as well as the s

Coherence: In Signal Processing and Mach
✍ David RamΓ­rez, Ignacio SantamarΓ­a, Louis Scharf πŸ“‚ Library πŸ“… 2023 πŸ› Springer 🌐 English

<p><span>This book organizes principles and methods of signal processing and machine learning into the framework of coherence. The book contains a wealth of classical and modern methods of inference, some reported here for the first time. General results are applied to problems in communications, co