Alleviating search uncertainty through concept associations: Automatic indexing, co-occurrence analysis, and parallel computing
✍ Scribed by Chen, Hsinchun ;Martinez, Joanne ;Kirchhoff, Amy ;Ng, Tobun D. ;Schatz, Bruce R.
- Publisher
- John Wiley and Sons
- Year
- 1998
- Tongue
- English
- Weight
- 271 KB
- Volume
- 49
- Category
- Article
- ISSN
- 0002-8231
No coin nor oath required. For personal study only.
✦ Synopsis
In this article, we report research on an algorithmic ap-gather, process, and retrieve information. These systems proach to alleviating search uncertainty in a large inforprovide a wide variety of information and services, rangmation space. Grounded on object filtering, automatic ing from daily updates of foreign and national news, indexing, and co-occurrence analysis, we performed a movie reviews and clips, law cases, and financial data large-scale experiment using a parallel supercomputer on companies to journal articles, books, trademarks, and (SGI Power Challenge) to analyze 400,000/ abstracts in an INSPEC computer engineering collection. Two sys-statistics. However, gaining access to such information is tem-generated thesauri, one based on a combined oboften difficult. This is due, in large part, to the indeterminject filtering and automatic indexing method, and the ism involved in the process by which information is inother based on automatic indexing only, were compared dexed, and to the latitude searchers have in expressing a with the human-generated INSPEC subject thesaurus.
query.
Our user evaluation revealed that the system-generated thesauri were better than the INSPEC thesaurus in concept recall, but in concept precision the 3 thesauri were 2. Using Thesauri to Alleviate Search comparable. Our analysis also revealed that the terms suggested by the 3 thesauri were complementary and Uncertainty: Literature Review could be used to significantly increase ''variety'' in search terms and thereby reduce search uncertainty.
2.1. Indexing and Search Uncertainty
The process of indexing is partly indeterminate. Evi-1. Introduction dence suggests that different indexers, well trained in an indexing scheme, might assign index terms for a given Large electronic information storage and retrieval sysdocument differently. It has also been observed that an tems and databases such as online catalogs, online biblioindexer might use different terms for the same document graphic databases, legal and finance databases, WWW at different times (Jacoby & Slamecka, 1962; Stevens, servers, and video databases are changing the way we 1965). Search uncertainty refers to the latitude searchers have ᭧ 1998 John Wiley & Sons, Inc.
in choosing search terms. An even higher degree of uncer-