✦ LIBER ✦

Employing multiple representations for Chinese information retrieval

✍ Scribed by Kwok, K.L.

Publisher: John Wiley and Sons
Year: 1999
Tongue: English
Weight: 295 KB
Volume: 50
Category: Article
ISSN: 0002-8231
DOI: 10.1002/(sici)1097-4571(1999)50:8<709::aid-asi8>3.0.co;2-v

No coin nor oath required. For personal study only.

✦ Synopsis

For information retrieval in the Chinese language, three representation methods for texts are popular, namely: 1-gram or character, bigram, and short-word. Each has its advantages as well as drawbacks. Employing more than one method may combine advantages from them and enhance retrieval effectiveness. We investigated two ways of using them simultaneously: mixing representations in documents and queries, and combining retrieval lists obtained via different representations. The experiments were done with the 170 MB evaluated Chinese corpora and 54 long and short queries available from the (TREC) program and using our Probabilistic Indexing and Retrieval Components System (PIRCS retrieval system). Experiments show that good retrieval need not depend on accurate word segmentation; approximate segmentation into short-words will do. Results also show and confirm that bigram representation alone works well; mixing characters with bigram representation boosts effectiveness further, but it is preferable to mix characters with short-word indexing which is more efficient, needs less resource, and gives better retrieval more often. Combining retrieval lists from short-word with character representation and from bigram indexing provides the best retrieval results but also at a substantial cost. Some results in this paper have been published in previous conference proceedings, see (Kwok 97a,b).

📜 SIMILAR VOLUMES

A “stereo” document representation for t

A “stereo” document representation for textual information retrieval

✍ Liang Chen; Jia Zeng; Naoyuki Tokuda 📂 Article 📅 2006 🏛 John Wiley and Sons 🌐 English ⚖ 234 KB

## Abstract A new document representation model is presented in this paper. This model is based on the idea of representing a document by two or more __pictures__ of the document taken from different perspectives. It is shown that by applying the __stereo__ representation model, enhanced textual re

A spoken-access approach for chinese tex

A spoken-access approach for chinese text and speech information retrieval

✍ Chien, Lee-Feng ;Wang, Hsin-Min ;Bai, Bo-Ren ;Lin, Sun-Chien 📂 Article 📅 2000 🏛 John Wiley and Sons 🌐 English ⚖ 189 KB 👁 1 views

This paper presents an efficient spoken-access approach for both Chinese text and Mandarin speech information retrieval. The proposed approach is developed not only to deal with the retrieval of spoken documents, but also to improve the capability of humancomputer interaction via voice input for inf

A lexical knowledge base approach for En

A lexical knowledge base approach for English–Chinese cross-language information retrieval

✍ Jiangping Chen 📂 Article 📅 2005 🏛 John Wiley and Sons 🌐 English ⚖ 209 KB

## Abstract This study proposes and explores a natural language processing‐ (NLP) based strategy to address out‐of‐dictionary and vocabulary mismatch problems in query translation based English–Chinese Cross‐Language Information Retrieval (EC‐CLIR). The strategy, named the __LKB approach__, is to c