An association-based method for automatic indexing with a controlled vocabulary
β Scribed by Plaunt, Christian ;Norgard, Barbara A.
- Publisher
- John Wiley and Sons
- Year
- 1998
- Tongue
- English
- Weight
- 220 KB
- Volume
- 49
- Category
- Article
- ISSN
- 0002-8231
No coin nor oath required. For personal study only.
β¦ Synopsis
In this article, we describe and test a two-stage algo-cords 1 and the assigned indexing (subject headings) of a rithm based on a lexical collocation technique which large set of human-indexed catalog records, we train our maps from the lexical clues contained in a document algorithm to predict which subject headings have a high representation into a controlled vocabulary list of sublikelihood of being associated with new titles (and abject headings. Using a collection of 4,626 INSPEC docustracts) when they are presented to an automated system. ments, we create a ''dictionary'' of associations between the lexical items contained in the titles, authors, and ab-Such an approach is not conceptually without precedent stracts, and controlled vocabulary subject headings as- (Kar & White, 1978; Maron, 1961; Maron & Kuhns, signed to those records by human indexers using a likeli-1960), but the computational resources and statistical hood ratio statistic as the measure of association. In the methods have limited the size and effectiveness of such deployment stage, we use the dictionary to predict which research. For the current research, we implement this of the controlled vocabulary subject headings best describe new documents when they are presented to the scheme using the authors, titles, abstracts, and controlled system. Our evaluation of this algorithm, in which we vocabulary subject headings in 4,626 catalog records from compare the automatically assigned subject headings to the INSPEC database on the University of California's the subject headings assigned to the test documents MELVYL online catalog.
by human catalogers, shows that we can obtain results
In order to ''learn'' the associations, we explore a comparable to, and consistent with, human cataloging. In effect, we have cast this as a classic partial match ''collocation'' technique borrowed from computational information retrieval problem. We consider the problem linguistics. The training phase identifies and extracts conto be one of ''retrieving'' (or assigning) the most probatent-bearing lexical items from elements found in bibliobly ''relevant'' (or correct) controlled vocabulary subject graphic records (authors, titles, subjects, abstracts) and headings to a document based on the clues contained ''collocates'' (associates) them with manually-assigned in that document.
subject headings (controlled vocabulary index terms). We take a broad view of ''collocation'' here, by which we * To whom all correspondence should be addressed.
1 Specifically, the titles, authors, and abstracts were examined.
π SIMILAR VOLUMES
This paper presents an object-based approach to the construction of manufacturing cell controllers. The cell components are represented as objects and communication as messages that are passed among the objects. Messages are acted upon by selected 'methods' (procedures) that are accessable to a cell