✦ LIBER ✦

A machine learning approach to inductive query by examples: An experiment using relevance feedback, ID3, genetic algorithms, and simulated annealing

✍ Scribed by Chen, Hsinchun ;Shankaranarayanan, Ganesan ;She, Linlin ;Iyer, Anand

Publisher: John Wiley and Sons
Year: 1998
Tongue: English
Weight: 215 KB
Volume: 49
Category: Article
ISSN: 0002-8231
DOI: 10.1002/(sici)1097-4571(199806)49:8<693::aid-asi4>3.0.co;2-o

No coin nor oath required. For personal study only.

✦ Synopsis

Information retrieval using probabilistic techniques has

1. Introduction attracted significant attention on the part of researchers in information and computer science over the past few

In the past few decades, the availability of cheap and decades. In the 1980s, knowledge-based techniques effective storage devices and information systems has also made an impressive contribution to ''intelligent'' inprompted the rapid growth and proliferation of relational, formation retrieval and indexing. More recently, informagraphical, and textual databases. Information collection tion science researchers have turned to other newer inductive learning techniques including symbolic learning, and storage efforts have become easier, but the amount genetic algorithms, and simulated annealing. These of effort required to retrieve relevant information has benewer techniques, which are grounded in diverse paracome significantly greater, especially in large-scale datadigms, have provided great opportunities for researchbases. This situation is particularly evident for textual ers to enhance the information processing and retrieval databases, which are widely used in traditional library capabilities of current information systems. In this article, we first provide an overview of these newer techscience environments, in business applications (e.g., manniques and their use in information retrieval research. uals, newsletters, and electronic data interchanges), and In order to familiarize readers with the techniques, we in scientific applications (e.g., electronic community syspresent three promising methods: The symbolic ID3 altems and scientific databases). Information stored in gorithm, evolution-based genetic algorithms, and simulated annealing. We discuss their knowledge representa-these databases often has become voluminous, fragtions and algorithms in the unique context of information mented, and unstructured after years of intensive use.

retrieval. An experiment using a 8000-record COMPEN Only users with extensive subject area knowledge, system database was performed to examine the performances knowledge, and classification scheme knowledge are able of these inductive query-by-example techniques in comto maneuver and explore in these textual databases parison with the performance of the conventional relevance feedback method. The machine learning tech- .

niques were shown to be able to help identify new docu-Most commercial information retrieval systems still ments which are similar to documents initially suggested rely on conventional inverted index and Boolean querying by users, and documents which contain similar concepts techniques. Even full-text retrieval has produced less than to each other. Genetic algorithms, in particular, were satisfactory results . Probabilistic found to out-perform relevance feedback in both document recall and precision. We believe these inductive retrieval techniques have been used to improve the remachine learning techniques hold promise for the ability trieval performance of information retrieval systems to analyze users' preferred documents (or records), . identify users' underlying information needs, and also Despite various extensions, probabilistic methodology suggest alternatives for search for database managestill requires the independence assumption for terms and ment systems and Internet applications.

it suffers from difficulty of estimating term-occurrence parameters correctly .