๐”– Bobbio Scriptorium
โœฆ   LIBER   โœฆ

Alignment of bilingual named entities in parallel corpora using statistical models and multiple knowledge sources

โœ Scribed by Lee, Chun-Jen; Chang, Jason S.; Jang, Jyh-Shing R.


Book ID
121406207
Publisher
Association for Computing Machinery
Year
2006
Tongue
English
Weight
493 KB
Volume
5
Category
Article
ISSN
1530-0226

No coin nor oath required. For personal study only.

โœฆ Synopsis


Named entity (NE) extraction is one of the fundamental tasks in natural language processing (NLP). Although many studies have focused on identifying NEs within monolingual documents, aligning NEs in bilingual documents has not been investigated extensively due to the complexity of the task. In this article we introduce a new approach to aligning bilingual NEs in parallel corpora by incorporating statistical models with multiple knowledge sources. In our approach, we model the process of translating an English NE phrase into a Chinese equivalent using lexical translation/transliteration probabilities for word translation and alignment probabilities for word reordering. The method involves automatically learning phrase alignment and acquiring word translations from a bilingual phrase dictionary and parallel corpora, and automatically discovering transliteration transformations from a training set of name-transliteration pairs. The method also involves language-specific knowledge functions, including handling abbreviations, recognizing Chinese personal names, and expanding acronyms. At runtime, the proposed models are applied to each source NE in a pair of bilingual sentences to generate and evaluate the target NE candidates; the source and target NEs are then aligned based on the computed probabilities. Experimental results demonstrate that the proposed approach, which integrates statistical models with extra knowledge sources, is highly feasible and offers significant improvement in performance compared to our previous work, as well as the traditional approach of IBM Model 4.


๐Ÿ“œ SIMILAR VOLUMES