<p>Most of the papers in this volume were first presented at the Workshop on Cross-Linguistic Information Retrieval that was held August 22, 1996 durΒ ing the SIGIR'96 Conference. Alan Smeaton of Dublin University and Paraic Sheridan of the ETH, Zurich, were the two other members of the Scientific C
Cross-Language Information Retrieval
β Scribed by Jian-Yun Nie
- Publisher
- Morgan & Claypool
- Year
- 2010
- Tongue
- English
- Leaves
- 142
- Series
- Synthesis Lectures on Human Language Technologies #8
- Category
- Library
No coin nor oath required. For personal study only.
β¦ Table of Contents
Cross-Language Information Retrieval
Synthesis Lectures in Human Language Technologies
ABSTRACT
Keywords
Dedication
Contents
Preface
Acknowledgement
chapter 1 Introduction
1.1 General IR problems
1.2 General IR approaches
1.2.1 IR Models
1.2.1.1 Boolean Models
1.2.1.2 Vector Space Model
1.2.1.3 Probabilistic Models
1.2.1.4 Statistical Language Models
1.2.2 Query Expansion
1.2.3 System Evaluation
1.3 Language problems in IR
1.3.1 European Languages
1.3.1.1 Word Stemming
1.3.1.2 Decompounding
1.3.2 East Asian Languages
1.3.2.1 Chinese and Word Segmentation
1.3.2.2 Japanese and Korean
1.3.3 Other Languages
1.4 The problems of cross-language information retrieval
1.4.1 Query Translation vs. Document Translation
1.4.2 Using Pivot Language and Interlingua
1.5 Approaches to translation in CLIR
1.6 The need for cross-language and multilingual IR
1.7 The history of CLIR
chapter 2 Using Manually Constructed Translation Systems and Resources for CLIR
2.1 Machine translation
2.1.1 Rule-Based MT
2.1.2 Statistical MT
2.2 Basic utilization of MT in CLIR
2.2.1 Rule-Based MT
2.2.2 Statistical MT
2.2.3 Unknown Word
2.3. Open the box of MT
2.4. Dictionary-based Translation for CLIR
2.4.1 Basic Approaches
2.4.2 The Term Weighting Problem
2.4.3 Coverage of the Dictionary
2.4.4 Translation Ambiguity
2.4.5 Selection of Translation Words
2.4.6 Other Related Approaches
2.4.6.1 Phrase-Based and Structured Query Translation
2.4.6.2 Using Multilingual Thesauri
chapter 3 Translation Based on Parallel and Comparable Corpora
3.1 Parallel corpora
3.2 Paragraph/sentence alignment
3.3 Utilization of translation models in CLIR
3.4 Embedding translation models into CLIR models
3.5 Alternative approaches using parallel corpora
3.5.1 Exploiting a Parallel Corpus by Pseudo-Relevance Feedback
3.5.2 Using Latent Semantic Indexing (LSI)
3.5.3 Using Comparable Corpora
3.6 Discussions on CLIR methods and resources
3.7 Mining for translation resources and relations
3.7.1 Mining for Parallel Texts
3.7.2 Transliteration
3.7.3 Mining Translations Using Hyperlinks
3.7.4 Mining Translations from Monolingual Web Pages
chapter 4 Other Methods to Improve CLIR
4.1 Pre- and post-translation expansion
4.2 Fuzzy matching
4.3 Combining translations
4.4 Transitive translation
4.5 Integrating monolingual and translingual relations
4.6 Discussions
chapter 5 A Look into the Future: Toward a Unified View of Monolingual IR and CLIR
5.1 What has been achieved?
5.2 Inspiring from monolingual IR
5.2.1 Parallel Between Query Expansion and Query Translation
5.2.2 Inspiring Query Translation from Query Expansion-An Example
References
Author Biography
π SIMILAR VOLUMES
Search for information is no longer exclusively limited within the native language of the user, but is more and more extended to other languages. This gives rise to the problem of cross-language information retrieval (CLIR), whose goal is to find relevant information written in a different langua
<p>The last decade has been one of dramatic progress in the field of Natural Language Processing (NLP). This hitherto largely academic discipline has found itself at the center of an information revolution ushered in by the Internet age, as demand for human-computer communication and informaΒ tion a
<p>The last decade has been one of dramatic progress in the field of Natural Language Processing (NLP). This hitherto largely academic discipline has found itself at the center of an information revolution ushered in by the Internet age, as demand for human-computer communication and informaΒ tion a
<p>A statisticallanguage model, or more simply a language model, is a probΒ abilistic mechanism for generating text. Such adefinition is general enough to include an endless variety of schemes. However, a distinction should be made between generative models, which can in principle be used to synthes
As online information grows dramatically, search engines such as Google are playing a more and more important role in our lives. Critical to all search engines is the problem of designing an effective retrieval model that can rank documents accurately for a given query. This has been a central resea