𝔖 Bobbio Scriptorium
✦   LIBER   ✦

An association-based method for automatic indexing with a controlled vocabulary

✍ Scribed by Plaunt, Christian ;Norgard, Barbara A.


Publisher
John Wiley and Sons
Year
1998
Tongue
English
Weight
220 KB
Volume
49
Category
Article
ISSN
0002-8231

No coin nor oath required. For personal study only.

✦ Synopsis


In this article, we describe and test a two-stage algo-cords 1 and the assigned indexing (subject headings) of a rithm based on a lexical collocation technique which large set of human-indexed catalog records, we train our maps from the lexical clues contained in a document algorithm to predict which subject headings have a high representation into a controlled vocabulary list of sublikelihood of being associated with new titles (and abject headings. Using a collection of 4,626 INSPEC docustracts) when they are presented to an automated system. ments, we create a ''dictionary'' of associations between the lexical items contained in the titles, authors, and ab-Such an approach is not conceptually without precedent stracts, and controlled vocabulary subject headings as- (Kar & White, 1978; Maron, 1961; Maron & Kuhns, signed to those records by human indexers using a likeli-1960), but the computational resources and statistical hood ratio statistic as the measure of association. In the methods have limited the size and effectiveness of such deployment stage, we use the dictionary to predict which research. For the current research, we implement this of the controlled vocabulary subject headings best describe new documents when they are presented to the scheme using the authors, titles, abstracts, and controlled system. Our evaluation of this algorithm, in which we vocabulary subject headings in 4,626 catalog records from compare the automatically assigned subject headings to the INSPEC database on the University of California's the subject headings assigned to the test documents MELVYL online catalog.

by human catalogers, shows that we can obtain results

In order to ''learn'' the associations, we explore a comparable to, and consistent with, human cataloging. In effect, we have cast this as a classic partial match ''collocation'' technique borrowed from computational information retrieval problem. We consider the problem linguistics. The training phase identifies and extracts conto be one of ''retrieving'' (or assigning) the most probatent-bearing lexical items from elements found in bibliobly ''relevant'' (or correct) controlled vocabulary subject graphic records (authors, titles, subjects, abstracts) and headings to a document based on the clues contained ''collocates'' (associates) them with manually-assigned in that document.

subject headings (controlled vocabulary index terms). We take a broad view of ''collocation'' here, by which we * To whom all correspondence should be addressed.

1 Specifically, the titles, authors, and abstracts were examined.


πŸ“œ SIMILAR VOLUMES


An object-based representation method fo
✍ Oded Z. Maimon; Edward L. Fisher πŸ“‚ Article πŸ“… 1988 πŸ› Elsevier Science 🌐 English βš– 926 KB

This paper presents an object-based approach to the construction of manufacturing cell controllers. The cell components are represented as objects and communication as messages that are passed among the objects. Messages are acted upon by selected 'methods' (procedures) that are accessable to a cell