𝔖 Bobbio Scriptorium
✦   LIBER   ✦

The limitations of term co-occurrence data for query expansion in document retrieval systems

✍ Scribed by Peat, Helen J. ;Willett, Peter


Publisher
John Wiley and Sons
Year
1991
Tongue
English
Weight
632 KB
Volume
42
Category
Article
ISSN
0002-8231

No coin nor oath required. For personal study only.

✦ Synopsis


Term cooccurrence data has been extensively used in document retrieval systems for the identification of indexing terms that are similar to those that have been specified in a user query: these similar terms can then be used to augment the original query statement. Despite the plausibility of this approach to query expansion, the retrieval effectiveness of the expanded queries is often no greater than, or even less than, the effectiveness of the unexpanded queries. This article demonstrates that the similar terms identified by cooccurrence data in a query expansion system tend to occur very frequently in the database that is being searched. Unfortunately, frequent terms tend to discriminate poorly between relevant and nonrelevant documents, and the general effect of query expansion is thus to add terms that do little or nothing to improve the discriminatory power of the original query.