Multiword Units in Machine Translation and Translation Technology

✍ Scribed by Ruslan Mitkov, Johanna Monti, Gloria Corpas Pastor, Violeta Seretan

Publisher: John Benjamins Publishing Company
Year: 2018
Tongue: English
Leaves: 271
Series: Current Issues in Linguistic Theory 341
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

The correct interpretation of Multiword Units (MWUs) is crucial to many applications in Natural Language Processing but is a challenging and complex task. In recent years, the computational treatment of MWUs has received considerable attention but there is much more to be done before we can claim that NLP and Machine Translation (MT) systems process MWUs successfully.

This volume provides a general overview of the field with particular reference to Machine Translation and Translation Technology and focuses on languages such as English, Basque, French, Romanian, German, Dutch and Croatian, among others. The chapters of the volume illustrate a variety of topics that address this challenge, such as the use of rule-based approaches, compound splitting techniques, MWU identification methodologies in multilingual applications, and MWU alignment issues.

✦ Table of Contents

MULTIWORD UNITS IN MACHINE TRANSLATION AND TRANSLATION TECHNOLOGY
Editorial page
Title page
LCC data
Table of contents
About the editors
Multiword units in machine translation and translation technology
1. Introduction
2. Multiword units in natural language processing
2.1 Historical notes
2.2 POS tagging and parsing
2.3 Word sense disambiguation
2.4 Information extraction and information retrieval
2.5 Other applications
3. Multiword unit processing in machine translation
3.1 Historical notes
3.2 Multiword unit processing in RBMT
3.3 Multiword unit processing in EBMT
3.4 Multiword unit processing in SMT
4. Multiword units in translation technology
References
Analysing linguistic information about word combinations for a Spanish-Basque rule-based machine tra
1. Introduction
2. Definitions, challenges and treatment of MWUs in MT
3. Linguistic analysis of Basque and Spanish noun + verb combinations
3.1 Noun + verb combinations in bilingual dictionaries
3.1.1 Basque and Spanish noun + verb combinations in the dictionary
3.1.2 Translations of noun + verb combinations in the dictionary
3.1.3 Equivalences of noun + verb constructions in translations
3.2 Contrasting information with parallel corpora
3.3 Classification of the Spanish MWUs
3.3.1 Syntactic flexibility
3.3.2 Semantic compositionality
4. Evaluation of MWU detection and translation adequacy
4.1 Evaluation of MWU detection
4.2 Evaluation of MWU translation quality in an RBMT system
5. Conclusions and future work
Acknowledgements
References
How do students cope with machine translation output of multiword units? An exploratory study
1. Introduction
2. Experimental set-up
3. Analysis
4. Conclusion
5. Future work
References
Aligning verb + noun collocations to improve a French-Romanian FSMT system
1. Context and motivation
2. Handling MWEs for MT
3. Collocation definition
4. Translation problems
5. The Architecture of the FSMT system and verb + noun collocation integration
6. Preprocessing Verb + Noun collocations
7. The MWE dictionary
8. The collocation alignment algorithm
9. Experiments
9.1 MWEs and the lexical alignment system
9.2 MWEs and FSMT system
9.3 MWE identification before aligning
10. Conclusions and future work
References
Multiword expressions in multilingual information extraction
1. Introduction
2. Application context
3. MWEs in multilingual information processing
3.1 MWE extraction
3.2 MWE lexical representation
3.2.1 Design
3.2.2 The lexicon
3.2.2.1 General annotations. Such annotations refer to the definition of a lexicon entry, and to information needed for processing.
3.2.2.2 MWE extensions. Multiword expressions share the above-mentioned annotation with the other lexicon entries. The entry definition features are as follows: the value of the lemma feature is the lemma of the multiword (e.g. take into account); the va
3.3 MWE analysis and identification
3.3.1 Design
3.3.1.1 MWE processing after analysis. Early attempts in MT systems like LMT (McCord, 1989) or METAL (Thurmair, 1990) only analysed single words, and used the transfer component for MWE treatment: The transfer lexicon contained entries with tests on the
3.3.1.2 MWE processing before analysis. An alternative approach is to do MWE processing before the analysis. A special component looks over the input string and marks MWE candidates. Analysis then starts with such MWEs as single nodes in the input.
3.3.2 MWE treatment in analysis
3.3.2.1 Preprocessing. Incoming texts are deformatted, and then split into sentences. Next, instead of tokenisation, an allomorph-based segmentation component is used; as already mentioned, tokenisation would address the fact that there is much more varia
3.3.2.2 Chart initialisation. The process of chart initialisation looks up each allomorph in the lexicon. The lexicon is compiled such that all allomorphs of an entry point to the entry in a hash table. In the case of MWEs, an allomorph instantiates all
3.3.2.3 Analysis design. The analysis component is designed as an X-bar scheme, extended such that the MWE readings are integrated in the earliest possible steps. Both single words and multiwords can take modifiers and specifiers to form XP constituents;
3.3.2.4 Multiword analysis rules. Multiwords are processed on the X1 level. But before the multiword rules fire, basic morphology rules are applied (on the X0 level) which attach inflections to the lemmata, as in No → No N-Flex and Vb → Prefix Vb V-Flex.
3.3.2.5 Analysis output. As the system does not intend to do full parsing, the output structure is a flat collection of XP constituents under an S node, created by recursive rules like S → S XP. The XP nodes can contain named entities, terms (built from s
3.4 MWE translation and generation
3.4.1 Transfer
3.4.2 Generation
4. Evaluation
4.1 MWE coverage: rule – lexicon compatibility
4.2 Lexicon coverage
5. Conclusion
Acknowledgements
References
A multilingual gold standard for translation spotting of German compounds and their corresponding multiword units in English, French, Italian and Spanish
1. Introduction
1.1 Related work
2. Resources
2.1 Selection and preprocessing of the gold standard material
3. Evaluation and discussion
3.1 Quality of universal part-of-speech (UPOS) tagging
3.2 Aligned UPOS tags across languages
3.3 Complexity of the compounds and aligned MWUs
3.4 Evaluation of the quality of the automatic GIZA++ word alignment
3.5 Optimisation of the directed word alignments through symmetrisation
3.6 Frequency effects
3.7 Effects of morphological complexity
3.8 Lexicalisation and variability
4. Conclusion
Acknowledgements
References
Appendix
Dutch compound splitting for bilingual terminology extraction
1. Introduction
2. Dutch compound splitter
2.1 Domain adaptation
2.2 Data Sets and Experiments
3. Impact on word alignment
3.1 Data sets and experiments
4. Impact on terminology extraction
4.1 Experiments
5. Conclusion
References
A flexible framework for collocation retrieval and translation from parallel and comparable corpora
1. Introduction
2. Phraseology
2.1 Typologies of collocations
2.2 Transfer rules
3. Related work
3.1 Collocation retrieval
3.2 Parallel corpora
3.3 Comparable corpora
4. System
4.1 Candidate selection module
4.2 Candidate filtering module
4.3 Dictionary look-up module
4.4 Parallel corpora module
4.5 Comparable corpora module
5. Evaluation
5.1 Experimental setup
5.2 Experimental results
5.3 Discussion and future work
Acknowledgements
References
On identification of bilingual lexical bundles for translation purposes
1. Introduction
2. Background and related work
3. Research material and methodology
4. Results
5. Discussion and conclusions
References
Appendix 1. Pivot LBs in EPILs
The quest for Croatian idioms as multiword units
1. Introduction
2. Theoretical background
2.1 Idioms as a type of MWU in Croatian language
2.2 Importance of idiom detection in translation
2.3 Previous work
3. Corpus of Croatian idioms
4. NooJ – NLP tool of our choice
5. Dictionaries and syntactic grammars
6. Classification of idioms
6.1 Idioms of Type 1
6.2 Idioms of Type 2
6.3 Idioms of Type 3
6.4 Idioms of Type 4
6.5 Idioms of Type 5
7. Results
8. Conclusion
Acknowledgements
References
Corpus analysis of Croatian constructions with the verb doći ‘to come’
1. Introduction
2. Lexicographic description of the verb doći in contemporary Croatian dictionaries
3. Corpus analysis of the verb doći
3.1 Collexeme analysis of lemma doći and constructions doći do and doći na
3.1.1 Corpus analysis of the construction doći do ‘to come to’
3.1.2 Corpus analysis of the construction doći na ‘to come onto’
4. Conclusion
References
Anaphora resolution, collocations and translation
1. Introduction
2. Collocations in Translation
3. Translating collocations with Its-2
3.1 Anaphora resolution
4. Results and evaluation
4.1 Two experiments
4.2 Evaluation of the precision of the AR procedure
5. Conclusion
References
Index

📜 SIMILAR VOLUMES

Nation, Language, and the Ethics of Tran

📁 Nation, Language, and the Ethics of Translation (Translation Transnation)

✍ Sandra Bermann, Michael Wood 📂 Library 📅 2005 🏛 Princeton University Press 🌐 English

In recent years, scholarship on translation has moved well beyond the technicalities of converting one language into another and beyond conventional translation theory. With new technologies blurring distinctions between "the original" and its reproductions, and with globalization redefining nationa

Explorations in Empirical Translation Pr

📁 Explorations in Empirical Translation Process Research (Machine Translation: Technologies and Applications, 3)

✍ Michael Carl (editor) 📂 Library 📅 2021 🏛 Springer 🌐 English

This book assembles fifteen original, interdisciplinary research chapters that explore methodological and conceptual considerations as well as user and usage studies to elucidate the relation between the translation product and translation/post-editing processes. It introduces numerous inno

Times of Mobility: Transnational Literat

📁 Times of Mobility: Transnational Literature and Gender in Translation

✍ Jasmina Lukic (editor); Sibelan Forrester (editor); Borbála Faragó (editor) 📂 Library 📅 2020 🏛 Central European University Press 🌐 English

In an era of increased mobility and globalisation, a fast growing body of writing originates from authors who live in-between languages and cultures. In response to this challenge, transnational perspective offers a new approach to the growing body of cultural texts with an emphasis on experience

The Spread of Novels: Translation and Pr

📁 The Spread of Novels: Translation and Prose Fiction in the Eighteenth Century (Translation Transnation)

✍ Mary Helen McMurran 📂 Library 📅 2009 🌐 English

Fiction has always been in a state of transformation and circulation: how does this history of mobility inform the emergence of the novel? The Spread of Novels explores the active movements of English and French fiction in the eighteenth century and argues that the new literary form of the novel was

Feminist Translation Studies: Local and

📁 Feminist Translation Studies: Local and Transnational Perspectives

✍ Olga Castro; Emek Ergun 📂 Library 📅 2017 🏛 Routledge 🌐 English

Feminist Translation Studies: Local and Transnational Perspectivessituates feminist translation as political activism. Chapters highlight the multiple agendas and visions of feminist translation and the different political voices and cultural heritages through which it speaks across times a

Translation in Transition: Between cogni

📁 Translation in Transition: Between cognition, computing and technology

✍ Arnt Lykke Jakobsen, Bartolomé Mesa-Lao 📂 Library 📅 2017 🏛 John Benjamins Publishing Company 🌐 English

Translation practice and workflows have witnessed significant changes during the last decade. New market demands to handle digital content as well as technological advances are leading this transition. The development and integration of machine translation systems have given post-editing practices a