<p><strong>Harald Burger</strong>, University of Zรผrich, Switzerland; <strong>Dmitri Dobrovolโskij</strong>, Russian Academy of Science, Moscow, Russia; <strong>Peter Kรผhn</strong>, University of Trier, Germany; <strong>Neal R. Norrick</strong>, Saarland University, Germany.<br></p>
Computational Phraseology
โ Scribed by Gloria Corpas Pastor, Jean-Pierre Colson (editors)
- Publisher
- John Benjamins Publishing Company
- Year
- 2020
- Tongue
- English
- Leaves
- 341
- Series
- IVITRA Research in Linguistics and Literature: Studies, Editions and Translations
- Category
- Library
No coin nor oath required. For personal study only.
โฆ Synopsis
Whether you wish to deliver on a promise, take a walk down memory lane or even on the wild side, phraseological units (also often referred to as phrasemes or multiword expressions) are present in most communicative situations and in all world's languages. Phraseology, the study of phraseological units, has therefore become a rare unifying theme across linguistic theories.
In recent years, an increasing number of studies have been concerned with the computational treatment of multiword expressions: these pertain among others to their automatic identification, extraction or translation, and to the role they play in various Natural Language Processing applications. Computational Phraseology is a comparatively new field where better understanding and more advances are urgently needed. This book aims to address this pressing need, by bringing together contributions focusing on different perspectives of this promising interdisciplinary field.
โฆ Table of Contents
Computational Phraseology
Editorial page
Title page
Copyright page
Table of contents
Foreword
The chapters
Profiling phraseology in different languages
Measures for phraseology discovery
All we need is corpora
References
Introduction
References
Monocollocable words: A type of language combinatory periphery
0. Opening
1. By way of introduction
2. Substance and definition of monocollocable words
3. Are there monocollocable words on the language periphery only?
4. Distribution of monocollocable words
5. Language combinations and language periphery
6. Identification of MWS in corpus
7. Outlook and applications
References
Translation asymmetries of multiword expressions in machine translation: An analysis of the TED-MWE corpus
1. Introduction
2. Related work
3. The TED-MWE corpus
4. The annotation guidelines
5. The annotation methodology
Individual annotation
Inter-annotation validation
Evaluation
6. The results of the annotation process
7. Translation asymmetries and mistranslations in the TED-MWE corpus
7. Conclusions and future work
References
Correspondence information
German constructional phrasemes and their Russian counterparts: A corpus-based study
1. Introduction
2. German deictic elements hin and her: Semantics and combinatorial potential
3. Construction vor sich her: Underlying pattern, semantics and Russian counterparts
3.1 The constructional phraseme [vor sich her + v] and its underlying pattern
3.2 Semantic derivation, construction polysemy and lexicographic description
4. The construction vor sich hin: Semantics, co-occurrence types and Russian counterparts
4.1 Types of verbs co-occurring with vor sich hin
4.2 Vor sich hin: Semantic features
5. Conclusion
Funding
References
Corpora
Computational phraseology and translation studies: From theoretical hypotheses to practical tools
1. Introduction
2. Phraseology and translation studies
3. Problems posed by phraseology to human translation
4. Problems posed by phraseology to machine translation
5. Theoretical hypotheses
6. Towards new practical tools
7. Conclusion
References
Computational extraction of formulaic sequences from corpora: Two case studies of a new extraction algorithm
1. Introduction
1.1 Counting co-occurrences
1.2 N-Gram sizes/configurations and the problem of redundancy
1.3 Recent approaches
2. The MERGE algorithm
3. Case study 1: MERGE vs. AFL
3.1 Materials
3.2 Results
3.3 Interim conclusions
4. Case study 2: Exploring MERGE in the context of L1 acquisition
4.1 Materials and methods
4.2 Results
4.3 Discussion
5. Conclusion
References
Appendix. Summary statistics for the linear model on the acquisition data
Computational phraseology discovery in corpora with the MWETOOLKIT
1. Introduction
2. Computational phraseology discovery
2.1 General architecture
2.2 Freely available tools
3. The mwetoolkit
4. Phraseology discovery with the mwetoolkit
4.1 Candidate search patterns
4.2 Association scores
4.3 Other scores
5. Conclusions and open issues
References
Multiword expressions in comparable corpora
1. Comparable corpora: A brief survey
2. Aranea comparable corpora
2.1 Methodology
2.2 Available corpora
2.3 Access to CC
3. Multi-word expressions in comparable corpora
3.1 Competition between monolingual and comparable corpora
3.2 Data mining in comparable corpora
4. Conclusion
References
Collecting collocations from general and specialised corpora: A comparative analysis
1. Introduction
2. Lexical combinations in terminology and lexicography
3. A comparative analysis
3.1 Corpora
3.2 Lexical items selected
3.3 Automated extraction of collocations
4. Observations on the lists of candidate collocations
4.1 Overlap of candidate collocates
4.2 Rank of candidates
4.3 How collocates reveal specific meanings of items
5. Concluding remarks: Summary and guidelines for terminologists
Acknowledgements
References
Appendix
Rรฉsumรฉ
Funding information
What matters more: The size of the corpora or their quality?: The case of automatic translation of multiword expressions using comparable corpora
1. Rationale
2. Our methodology for translating multiword expressions
3. Data and experiments
3.1 Comparable corpora
3.2 Data
3.3 Vector representations
3.4 Gold standard
4. Comparable corpora and translation of mwes: Size vs. quality
5. Conclusion
References
Statistical significance for measures of collocation strength (WP3)
1. Introduction
2. The chi-squared test (X2)
3. The log-likelihood test (G2)
4. Fisherโs exact test
5. The z-score
6. The t-test
7. Pointwise mutual information
8. Computer simulations to estimate statistical significance
9. The poisson distribution
10. Confidence limits of the mean and standard deviation
11. Experimental comparison of measures
12. Conclusion
References
Verbal collocations and pronominalisation
1. Introduction
2. Parsing and collocation detection
3. Anaphora resolution
4. Verbal collocations and pronominalisation
5. Experimental results
5.1 Evaluation methodology
5.2 Evaluation results
6. Conclusion
References
Empirical variability of Italian multiword expressions as a useful feature for their categorisation
1. Introduction
2. Anomalous behaviours of Italian Multiword Expressions
3. A quantitative approach to MWEs
3.1 Reasons to go beyond statistics
3.2 Reasons for an empirical, quantitative approach to MWEs
4. Methodology
4.1 Syntactic variations
4.2 Lexical variations
4.3 Inflectional variations
5. Analysis and results
6. Conclusion
References
Too big to fail but big enough to pay for their mistakes: A collostructional analysis of the patterns [too ADJ to V] and [ADJ enough to V]
1. Introduction
2. Background
2.1 Descriptive background
2.2 Methodological background
3. Case studies
3.1 Data: Source, extraction, cleaning
3.2 Case study: Simple collexeme analysis (SCA)
3.3 Case study: Distinctive collexeme analysis (DCA)
3.4 Case study: Co-varying collexeme analysis (CCA)
3.5 Case study: Distinctive co-varying collexeme analysis (DCCA)
4. Summary
References
Multi-word patterns and networks: How corpus-driven approaches have changed our description of language use
1. Introduction
2. The rocky road of qualitative interpretation
3. Kinds of lexical fixedness
3.1 From multiword expressions to patterns
3.2 MWPS as autonomous units
3.3 Extended context patterns (ECPS)
4. Corpus-linguistic methodology and interpretation
4.1 Corpus searches
4.2 Collocation profiles
4.3 KWIC bundles and slot-filler analysis
5. A new type of corpus-driven, pattern-based MW dictionaries
6. Conclusion
References
Internet sources
Abbreviations
How context determines meaning
1. Patterns and valency
2. The verb is the pivot of the clause
3. Collocations and lexical sets
4. Core meaning
5. Phrasal verbs
6. Exploiting established phraseology
6.1 Phraseology that is both literal and figurative
7. Exploiting a proverb
8. Other IDIOMS with โblowโ
9. Conclusion
References
Detecting semantic difference: A new model based on knowledge and collocational association
1. Introduction
2. Related work
3. Methodology
3.1 Association-based score
3.2 Google N-Grams
3.3 Word embedding-based score
3.4 ConceptNet score
4. Experiments
4.1 Data
4.2 Experimental setup
4.3 Evaluation metrics
5. Results and discussion
6. Conclusion
References
Index
๐ SIMILAR VOLUMES
<p><strong>Harald Burger</strong>, University of Zรผrich, Switzerland; <strong>Dmitri Dobrovolโskij</strong>, Russian Academy of Science, Moscow, Russia;<strong>Peter Kรผhn</strong>, University of Trier, Germany;<strong>Neal R. Norrick</strong>, Saarland University, Germany.<br> </p>
Long regarded as a peripheral issue, phraseology is now taking centre stage in a wide range of fields. This recent explosion of interest undoubtedly has a great deal to do with the development of corpus linguistics research, which has both demonstrated the key role of phraseological expressions in l
This unique volume showcases the best presentations of the international conference โPhraseology in Multilingual Societyโ held at Kazan Federal University, Russia, in August 2013. The twenty-seven essays included here represent different research efforts by specialists in phraseology from around the
This paper explores ways in which research into collocation should be improved. After a discussion of the parameters underlying the notion of collocation, the paper has three main parts. First, I argue that corpus linguistics would benefit from taking more seriously the understudied fact that colloc